[
https://issues.apache.org/jira/browse/SPARK-57165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086540#comment-18086540
]
Uroš Bojanić commented on SPARK-57165:
--------------------------------------
Issue resolved by pull request 56298
https://github.com/apache/spark/pull/56298
> Add LiteralGenerator support for nanosecond-capable timestamp types
> -------------------------------------------------------------------
>
> Key: SPARK-57165
> URL: https://issues.apache.org/jira/browse/SPARK-57165
> Project: Spark
> Issue Type: Sub-task
> Components: SQL, Tests
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Minor
> Labels: pull-request-available
>
> h2. Summary
> Extend the test-only {{LiteralGenerator}} (in
> {{{}sql/catalyst/src/test/.../expressions/LiteralGenerator.scala{}}}) to
> produce
> random \{{Literal}}s for the nanosecond-capable timestamp types
> {{TimestampNTZNanosType(p)}} and {{TimestampLTZNanosType(p)}} (p in [7, 9]).
> Test code only - no user-facing API change.
> h2. Background
> {{LiteralGenerator.randomGen(dt)}} is the literal source for ScalaCheck
> property checks across expression suites (interpreted-vs-codegen consistency
> via
> {{{}ExpressionEvalHelper{}}}, ordering/predicate/hash suites, etc.). Today it
> only
> handles the microsecond timestamp types and throws for everything else:
> {code:java}
> case TimestampType => timestampLiteralGen
> case TimestampNTZType => timestampNTZLiteralGen
> ...
> case dt => throw new IllegalArgumentException(s"not supported type $dt")
> {code}
> So {{randomGen(TimestampNTZNanosType(9))}} /
> {{randomGen(TimestampLTZNanosType(7))}}
> currently throw {{{}IllegalArgumentException{}}}, and no property-based suite
> can
> exercise the nanos types.
> Two further limitations to address:
> * No nanosecond literal generator exists at all.
> * The existing micro generators derive from {{millisGen}}
> (millisecond-grained),
> so they never produce sub-millisecond fractional digits. The new generators
> must produce full sub-microsecond variation.
> The row/value-level counterpart ({{{}RandomDataGenerator{}}}) and the shared
> {{TimestampNanosTestUtils}} helper / {{specialNanosTs}} corpus were already
> added
> by SPARK-57034; this ticket is the expression-literal counterpart and should
> reuse those helpers where practical.
> h2. Scope
> * Add {{timestampLTZNanosLiteralGen(precision: Int)}} and
> {{timestampNTZNanosLiteralGen(precision: Int)}} producing
> {\{Literal}}s whose Catalyst value is
> {{org.apache.spark.unsafe.types.TimestampNanosVal(epochMicros, nanosOfMicro)}}
> with the matching data type. (Construct the literal with the internal
> {{{}TimestampNanosVal{}}}; do not rely on java.time external conversion,
> which is
> tracked separately under SPARK-57033.)
> * Wire them into {{{}randomGen{}}}:
> {code:java}
> case t: TimestampNTZNanosType => timestampNTZNanosLiteralGen(t.precision)
> case t: TimestampLTZNanosType => timestampLTZNanosLiteralGen(t.precision)
> {code}
> * Value distribution:
> ** {{{}epochMicros{}}}: reuse the existing valid-range bounds
> ([0001-01-01 .. 9999-12-31]) used by the micro generators.
> ** {{{}nanosOfMicro{}}}: random in [0, 999], biased to include the edge
> values
> {0, 1, 999}.
> ** Respect the declared precision {{p}} so generated values are valid for the
> type: p=7 -> {{nanosOfMicro}} multiple of 100, p=8 -> multiple of 10,
> p=9 -> any value in [0, 999].
> ** Mix in entries from {{TimestampNanosTestUtils.specialNanosTs}}
> (SPARK-57034).
> Keep all values normalized ({{{}nanosOfMicro{}}} in [0, 999]).
> h2. Acceptance criteria
> * For p in \{7, 8, 9}, {{randomGen(TimestampNTZNanosType(p))}} and
> {{randomGen(TimestampLTZNanosType(p))}} return generators that produce
> {{Literal}}s of the correct type carrying {{TimestampNanosVal}} values with
> visible nanosecond variation (and edge values \{0, 1, 999} appearing).
> * Generated values are valid for the declared precision and normalized.
> * Existing {{randomGen}} cases for {{TimestampType}} / {{TimestampNTZType}}
> are
> unchanged.
> * At least one property-based suite is extended (or a small targeted test
> added)
> to confirm a nanos type round-trips through interpreted vs codegen evaluation
> using the new generator.
> h2. Out of scope
> * {{RandomDataGenerator}} and {{TimestampNanosTestUtils}} (already delivered
> by
> SPARK-57034).
> * Any production code or behavior change.
> h2. Notes for first-time contributors
> Good first issue - test-only. Run an affected suite with SBT, e.g.:
> {code:java}
> build/sbt 'catalyst/testOnly *LiteralExpressionSuite'
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]