This is an automated email from the ASF dual-hosted git repository.
uros-b pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new b4777a99deef [SPARK-57165][SQL][TEST] Add LiteralGenerator support for
nanosecond-capable timestamp types
b4777a99deef is described below
commit b4777a99deefedbf2bd75c18ee5a98f2b16a03c0
Author: Maxim Gekk <[email protected]>
AuthorDate: Wed Jun 3 20:00:46 2026 +0200
[SPARK-57165][SQL][TEST] Add LiteralGenerator support for
nanosecond-capable timestamp types
### What changes were proposed in this pull request?
This PR extends the test-only `LiteralGenerator` (in
`sql/catalyst/src/test/.../expressions/LiteralGenerator.scala`) to produce
random `Literal`s for the nanosecond-capable timestamp types
`TimestampNTZNanosType(p)` and `TimestampLTZNanosType(p)` (p in `[7, 9]`), and
wires them into `randomGen`:
```scala
case t: TimestampNTZNanosType => timestampNTZNanosLiteralGen(t.precision)
case t: TimestampLTZNanosType => timestampLTZNanosLiteralGen(t.precision)
```
Details:
- New `timestampNTZNanosLiteralGen(precision)` /
`timestampLTZNanosLiteralGen(precision)` build `Literal`s whose Catalyst value
is `org.apache.spark.unsafe.types.TimestampNanosVal(epochMicros,
nanosWithinMicro)` with the matching data type (constructed via the internal
`TimestampNanosVal`, not external `java.time` conversion).
- A new microsecond-grained `microsGen` provides full sub-millisecond
variation (the existing `millisGen` is millisecond-grained), using the same
valid range `[0001-01-01 .. 9999-12-31]` as the micro generators.
- `nanosWithinMicro` is random in `[0, 999]`, biased to include the edge
values `{0, 1, 999}`, and respects the declared precision (`p=7` -> multiple of
100, `p=8` -> multiple of 10, `p=9` -> any), reusing
`TimestampNanosTestUtils.nanoOfSecTruncator`.
- Entries from `TimestampNanosTestUtils.specialNanosTs` (SPARK-57034) are
mixed in.
The existing `randomGen` cases for `TimestampType` / `TimestampNTZType` are
unchanged. The row/value-level counterpart (`RandomDataGenerator`) and the
shared `TimestampNanosTestUtils` helpers were already added by SPARK-57034;
this PR is the expression-literal counterpart and reuses those helpers.
### Why are the changes needed?
`LiteralGenerator.randomGen` is the literal source for ScalaCheck property
checks across expression suites (interpreted-vs-codegen consistency via
`ExpressionEvalHelper`, ordering/predicate/hash suites, etc.). Previously it
threw `IllegalArgumentException` for the nanos types, so no property-based
suite could exercise `TimestampNTZNanosType` / `TimestampLTZNanosType`. This is
a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).
### Does this PR introduce _any_ user-facing change?
No. Test-only change; no production code or behavior change.
### How was this patch tested?
- Added a targeted test in `LiteralExpressionSuite` that, for each
precision in `{7, 8, 9}` and both nanos types, confirms the generated literals
round-trip identically through interpreted vs codegen evaluation, are valid for
the declared precision, and expose visible nanosecond variation including the
edge values `{0, 1, 999}` at full precision.
- Ran:
```
build/sbt 'catalyst/testOnly *LiteralExpressionSuite *DateExpressionsSuite'
```
All 122 tests pass.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
Closes #56298 from MaxGekk/nanos-lit-gen.
Authored-by: Maxim Gekk <[email protected]>
Signed-off-by: Uros Bojanic <[email protected]>
---
.../expressions/LiteralExpressionSuite.scala | 34 +++++++++++
.../catalyst/expressions/LiteralGenerator.scala | 65 +++++++++++++++++++++-
2 files changed, 96 insertions(+), 3 deletions(-)
diff --git
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala
index d364f7830dec..40d7131269f7 100644
---
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala
+++
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala
@@ -32,6 +32,7 @@ import org.apache.spark.sql.catalyst.encoders.ExamplePointUDT
import org.apache.spark.sql.catalyst.util.DateTimeConstants._
import org.apache.spark.sql.catalyst.util.DateTimeTestUtils.localTime
import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.catalyst.util.TimestampNanosTestUtils
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.types._
import org.apache.spark.sql.types.DayTimeIntervalType._
@@ -101,6 +102,39 @@ class LiteralExpressionSuite extends SparkFunSuite with
ExpressionEvalHelper {
checkEvaluation(Literal.default(VarcharType(5)), "")
}
+ test("SPARK-57165: random literals for nanosecond-capable timestamp types") {
+ TimestampNanosTestUtils.foreachNanosPrecision { precision =>
+ val truncate = TimestampNanosTestUtils.nanoOfSecTruncator(precision)
+ Seq(TimestampNTZNanosType(precision),
TimestampLTZNanosType(precision)).foreach { dt =>
+ val gen = LiteralGenerator.randomGen(dt)
+ // Interpreted and codegen evaluation of the generated literals must
agree.
+ forAll(gen) { (lit: Literal) =>
+ assert(lit.dataType === dt)
+ val v = lit.value.asInstanceOf[TimestampNanosVal]
+ val nanos = v.nanosWithinMicro.toInt
+ assert(nanos >= 0 && nanos <=
TimestampNanosVal.MAX_NANOS_WITHIN_MICRO,
+ s"nanosWithinMicro $nanos out of range for $dt")
+ assert(truncate(nanos) == nanos,
+ s"nanosWithinMicro $nanos is not valid for precision $precision")
+ cmpInterpretWithCodegen(EmptyRow, lit)
+ }
+ // The generator must expose visible, precision-valid nanosecond
variation.
+ val sampled = (1 to 5000)
+ .flatMap(_ => gen.sample)
+ .map(_.value.asInstanceOf[TimestampNanosVal].nanosWithinMicro.toInt)
+ .toSet
+ assert(sampled.size > 1, s"expected nanosecond variation for $dt")
+ assert(sampled.forall(n => n >= 0 && truncate(n) == n))
+ // At full precision the edge values {0, 1, 999} must show up.
+ if (precision == TimestampNTZNanosType.NANOS_PRECISION) {
+ Seq(0, 1, TimestampNanosVal.MAX_NANOS_WITHIN_MICRO).foreach { edge =>
+ assert(sampled.contains(edge), s"expected edge value $edge for
$dt")
+ }
+ }
+ }
+ }
+ }
+
test("boolean literals") {
checkEvaluation(Literal(true), true)
checkEvaluation(Literal(false), false)
diff --git
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
index 49d6c73f506c..1e082e0f1163 100644
---
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
+++
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
@@ -18,7 +18,7 @@
package org.apache.spark.sql.catalyst.expressions
import java.sql.{Date, Timestamp}
-import java.time.{Duration, Instant, LocalDate, LocalTime, Period}
+import java.time.{Duration, Instant, LocalDate, LocalTime, Period, ZoneId}
import java.util.concurrent.TimeUnit
import org.scalacheck.{Arbitrary, Gen}
@@ -26,9 +26,10 @@ import org.scalatest.Assertions._
import
org.apache.spark.sql.catalyst.util.DateTimeConstants.{MICROS_PER_MILLIS,
MILLIS_PER_DAY, NANOS_PER_MICROS}
import org.apache.spark.sql.catalyst.util.DateTimeUtils
-import org.apache.spark.sql.catalyst.util.DateTimeUtils.{localTimeToNanos,
nanosToMicros}
+import org.apache.spark.sql.catalyst.util.DateTimeUtils.{instantToMicros,
localTimeToNanos, nanosToMicros}
+import org.apache.spark.sql.catalyst.util.TimestampNanosTestUtils
import org.apache.spark.sql.types._
-import org.apache.spark.unsafe.types.CalendarInterval
+import org.apache.spark.unsafe.types.{CalendarInterval, TimestampNanosVal}
/**
* Property is a high-level specification of behavior that should hold for a
range of data points.
@@ -154,6 +155,62 @@ object LiteralGenerator {
DateTimeUtils.microsToLocalDateTime(millis * MICROS_PER_MILLIS),
TimestampNTZType)
}
+ // Microsecond-grained epoch generator for the nanosecond timestamp types.
Unlike `millisGen`,
+ // which is millisecond-grained and therefore never yields sub-millisecond
fractional digits,
+ // this draws over the full microsecond range so generated values exercise
sub-millisecond
+ // variation. Bounds match the valid range used by the microsecond
generators.
+ private def microsGen = {
+ val minMicros =
instantToMicros(Instant.parse("0001-01-01T00:00:00.000000Z"))
+ val maxMicros =
instantToMicros(Instant.parse("9999-12-31T23:59:59.999999Z"))
+ Gen.choose(minMicros, maxMicros)
+ }
+
+ // Generates a `nanosWithinMicro` value in [0, 999], biased to include the
edge values
+ // {0, 1, 999}, and truncated to the declared precision so the result is
valid for
+ // TIMESTAMP(precision): p=7 -> multiple of 100, p=8 -> multiple of 10, p=9
-> any value.
+ private def nanosWithinMicroGen(precision: Int): Gen[Int] = {
+ val truncate = TimestampNanosTestUtils.nanoOfSecTruncator(precision)
+ Gen.oneOf(
+ Gen.oneOf(0, 1, TimestampNanosVal.MAX_NANOS_WITHIN_MICRO),
+ Gen.choose(0, TimestampNanosVal.MAX_NANOS_WITHIN_MICRO)
+ ).map(truncate)
+ }
+
+ // Builds a generator of nanosecond-timestamp literals of the given
`dataType`, mixing uniform
+ // random values with the precision-truncated `specialNanosTs` edge-case
corpus. The `special`
+ // values are supplied as already-converted `TimestampNanosVal`s so this
helper is shared by the
+ // NTZ and LTZ variants, which differ only in the external-to-physical
conversion and the type.
+ private def nanosLiteralGen(
+ precision: Int,
+ dataType: DataType,
+ special: Seq[TimestampNanosVal]): Gen[Literal] = {
+ val random = for {
+ micros <- microsGen
+ nanos <- nanosWithinMicroGen(precision)
+ } yield TimestampNanosVal.fromParts(micros, nanos.toShort)
+ Gen.oneOf(random, Gen.oneOf(special)).map(Literal.create(_, dataType))
+ }
+
+ def timestampNTZNanosLiteralGen(precision: Int): Gen[Literal] = {
+ val truncate = TimestampNanosTestUtils.nanoOfSecTruncator(precision)
+ val special = TimestampNanosTestUtils.specialNanosTs.map { s =>
+ val ldt = TimestampNanosTestUtils.parseSpecialNanosNTZ(s)
+
TimestampNanosTestUtils.localDateTimeToNanosVal(ldt.withNano(truncate(ldt.getNano)))
+ }
+ nanosLiteralGen(precision, TimestampNTZNanosType(precision), special)
+ }
+
+ def timestampLTZNanosLiteralGen(precision: Int): Gen[Literal] = {
+ val truncate = TimestampNanosTestUtils.nanoOfSecTruncator(precision)
+ val zoneId = ZoneId.systemDefault()
+ val special = TimestampNanosTestUtils.specialNanosTs.map { s =>
+ val instant = TimestampNanosTestUtils.parseSpecialNanosLTZ(s, zoneId)
+ TimestampNanosTestUtils.instantToNanosVal(
+ Instant.ofEpochSecond(instant.getEpochSecond,
truncate(instant.getNano).toLong))
+ }
+ nanosLiteralGen(precision, TimestampLTZNanosType(precision), special)
+ }
+
// Valid range for DateType and TimestampType is [0001-01-01, 9999-12-31]
private val maxIntervalInMonths: Int = 10000 * 12
@@ -208,6 +265,8 @@ object LiteralGenerator {
case _: TimeType => timeLiteralGen
case TimestampType => timestampLiteralGen
case TimestampNTZType => timestampNTZLiteralGen
+ case t: TimestampNTZNanosType => timestampNTZNanosLiteralGen(t.precision)
+ case t: TimestampLTZNanosType => timestampLTZNanosLiteralGen(t.precision)
case BooleanType => booleanLiteralGen
case StringType => stringLiteralGen
case BinaryType => binaryLiteralGen
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]