(spark) branch master updated: [SPARK-57165][SQL][TEST] Add LiteralGenerator support for nanosecond-capable timestamp types

uros Wed, 03 Jun 2026 11:01:08 -0700

This is an automated email from the ASF dual-hosted git repository.

uros-b pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new b4777a99deef [SPARK-57165][SQL][TEST] Add LiteralGenerator support for 
nanosecond-capable timestamp types
b4777a99deef is described below

commit b4777a99deefedbf2bd75c18ee5a98f2b16a03c0
Author: Maxim Gekk <[email protected]>
AuthorDate: Wed Jun 3 20:00:46 2026 +0200

    [SPARK-57165][SQL][TEST] Add LiteralGenerator support for 
nanosecond-capable timestamp types
    
    ### What changes were proposed in this pull request?
    
    This PR extends the test-only `LiteralGenerator` (in 
`sql/catalyst/src/test/.../expressions/LiteralGenerator.scala`) to produce 
random `Literal`s for the nanosecond-capable timestamp types 
`TimestampNTZNanosType(p)` and `TimestampLTZNanosType(p)` (p in `[7, 9]`), and 
wires them into `randomGen`:
    
    ```scala
    case t: TimestampNTZNanosType => timestampNTZNanosLiteralGen(t.precision)
    case t: TimestampLTZNanosType => timestampLTZNanosLiteralGen(t.precision)
    ```
    
    Details:
    - New `timestampNTZNanosLiteralGen(precision)` / 
`timestampLTZNanosLiteralGen(precision)` build `Literal`s whose Catalyst value 
is `org.apache.spark.unsafe.types.TimestampNanosVal(epochMicros, 
nanosWithinMicro)` with the matching data type (constructed via the internal 
`TimestampNanosVal`, not external `java.time` conversion).
    - A new microsecond-grained `microsGen` provides full sub-millisecond 
variation (the existing `millisGen` is millisecond-grained), using the same 
valid range `[0001-01-01 .. 9999-12-31]` as the micro generators.
    - `nanosWithinMicro` is random in `[0, 999]`, biased to include the edge 
values `{0, 1, 999}`, and respects the declared precision (`p=7` -> multiple of 
100, `p=8` -> multiple of 10, `p=9` -> any), reusing 
`TimestampNanosTestUtils.nanoOfSecTruncator`.
    - Entries from `TimestampNanosTestUtils.specialNanosTs` (SPARK-57034) are 
mixed in.
    
    The existing `randomGen` cases for `TimestampType` / `TimestampNTZType` are 
unchanged. The row/value-level counterpart (`RandomDataGenerator`) and the 
shared `TimestampNanosTestUtils` helpers were already added by SPARK-57034; 
this PR is the expression-literal counterpart and reuses those helpers.
    
    ### Why are the changes needed?
    
    `LiteralGenerator.randomGen` is the literal source for ScalaCheck property 
checks across expression suites (interpreted-vs-codegen consistency via 
`ExpressionEvalHelper`, ordering/predicate/hash suites, etc.). Previously it 
threw `IllegalArgumentException` for the nanos types, so no property-based 
suite could exercise `TimestampNTZNanosType` / `TimestampLTZNanosType`. This is 
a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond precision).
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. Test-only change; no production code or behavior change.
    
    ### How was this patch tested?
    
    - Added a targeted test in `LiteralExpressionSuite` that, for each 
precision in `{7, 8, 9}` and both nanos types, confirms the generated literals 
round-trip identically through interpreted vs codegen evaluation, are valid for 
the declared precision, and expose visible nanosecond variation including the 
edge values `{0, 1, 999}` at full precision.
    - Ran:
    ```
    build/sbt 'catalyst/testOnly *LiteralExpressionSuite *DateExpressionsSuite'
    ```
    All 122 tests pass.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Cursor
    
    Closes #56298 from MaxGekk/nanos-lit-gen.
    
    Authored-by: Maxim Gekk <[email protected]>
    Signed-off-by: Uros Bojanic <[email protected]>
---
 .../expressions/LiteralExpressionSuite.scala       | 34 +++++++++++
 .../catalyst/expressions/LiteralGenerator.scala    | 65 +++++++++++++++++++++-
 2 files changed, 96 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala
index d364f7830dec..40d7131269f7 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala
@@ -32,6 +32,7 @@ import org.apache.spark.sql.catalyst.encoders.ExamplePointUDT
 import org.apache.spark.sql.catalyst.util.DateTimeConstants._
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils.localTime
 import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.catalyst.util.TimestampNanosTestUtils
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.types.DayTimeIntervalType._
@@ -101,6 +102,39 @@ class LiteralExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper {
     checkEvaluation(Literal.default(VarcharType(5)), "")
   }
 
+  test("SPARK-57165: random literals for nanosecond-capable timestamp types") {
+    TimestampNanosTestUtils.foreachNanosPrecision { precision =>
+      val truncate = TimestampNanosTestUtils.nanoOfSecTruncator(precision)
+      Seq(TimestampNTZNanosType(precision), 
TimestampLTZNanosType(precision)).foreach { dt =>
+        val gen = LiteralGenerator.randomGen(dt)
+        // Interpreted and codegen evaluation of the generated literals must 
agree.
+        forAll(gen) { (lit: Literal) =>
+          assert(lit.dataType === dt)
+          val v = lit.value.asInstanceOf[TimestampNanosVal]
+          val nanos = v.nanosWithinMicro.toInt
+          assert(nanos >= 0 && nanos <= 
TimestampNanosVal.MAX_NANOS_WITHIN_MICRO,
+            s"nanosWithinMicro $nanos out of range for $dt")
+          assert(truncate(nanos) == nanos,
+            s"nanosWithinMicro $nanos is not valid for precision $precision")
+          cmpInterpretWithCodegen(EmptyRow, lit)
+        }
+        // The generator must expose visible, precision-valid nanosecond 
variation.
+        val sampled = (1 to 5000)
+          .flatMap(_ => gen.sample)
+          .map(_.value.asInstanceOf[TimestampNanosVal].nanosWithinMicro.toInt)
+          .toSet
+        assert(sampled.size > 1, s"expected nanosecond variation for $dt")
+        assert(sampled.forall(n => n >= 0 && truncate(n) == n))
+        // At full precision the edge values {0, 1, 999} must show up.
+        if (precision == TimestampNTZNanosType.NANOS_PRECISION) {
+          Seq(0, 1, TimestampNanosVal.MAX_NANOS_WITHIN_MICRO).foreach { edge =>
+            assert(sampled.contains(edge), s"expected edge value $edge for 
$dt")
+          }
+        }
+      }
+    }
+  }
+
   test("boolean literals") {
     checkEvaluation(Literal(true), true)
     checkEvaluation(Literal(false), false)
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
index 49d6c73f506c..1e082e0f1163 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
@@ -18,7 +18,7 @@
 package org.apache.spark.sql.catalyst.expressions
 
 import java.sql.{Date, Timestamp}
-import java.time.{Duration, Instant, LocalDate, LocalTime, Period}
+import java.time.{Duration, Instant, LocalDate, LocalTime, Period, ZoneId}
 import java.util.concurrent.TimeUnit
 
 import org.scalacheck.{Arbitrary, Gen}
@@ -26,9 +26,10 @@ import org.scalatest.Assertions._
 
 import 
org.apache.spark.sql.catalyst.util.DateTimeConstants.{MICROS_PER_MILLIS, 
MILLIS_PER_DAY, NANOS_PER_MICROS}
 import org.apache.spark.sql.catalyst.util.DateTimeUtils
-import org.apache.spark.sql.catalyst.util.DateTimeUtils.{localTimeToNanos, 
nanosToMicros}
+import org.apache.spark.sql.catalyst.util.DateTimeUtils.{instantToMicros, 
localTimeToNanos, nanosToMicros}
+import org.apache.spark.sql.catalyst.util.TimestampNanosTestUtils
 import org.apache.spark.sql.types._
-import org.apache.spark.unsafe.types.CalendarInterval
+import org.apache.spark.unsafe.types.{CalendarInterval, TimestampNanosVal}
 
 /**
  * Property is a high-level specification of behavior that should hold for a 
range of data points.
@@ -154,6 +155,62 @@ object LiteralGenerator {
         DateTimeUtils.microsToLocalDateTime(millis * MICROS_PER_MILLIS), 
TimestampNTZType)
   }
 
+  // Microsecond-grained epoch generator for the nanosecond timestamp types. 
Unlike `millisGen`,
+  // which is millisecond-grained and therefore never yields sub-millisecond 
fractional digits,
+  // this draws over the full microsecond range so generated values exercise 
sub-millisecond
+  // variation. Bounds match the valid range used by the microsecond 
generators.
+  private def microsGen = {
+    val minMicros = 
instantToMicros(Instant.parse("0001-01-01T00:00:00.000000Z"))
+    val maxMicros = 
instantToMicros(Instant.parse("9999-12-31T23:59:59.999999Z"))
+    Gen.choose(minMicros, maxMicros)
+  }
+
+  // Generates a `nanosWithinMicro` value in [0, 999], biased to include the 
edge values
+  // {0, 1, 999}, and truncated to the declared precision so the result is 
valid for
+  // TIMESTAMP(precision): p=7 -> multiple of 100, p=8 -> multiple of 10, p=9 
-> any value.
+  private def nanosWithinMicroGen(precision: Int): Gen[Int] = {
+    val truncate = TimestampNanosTestUtils.nanoOfSecTruncator(precision)
+    Gen.oneOf(
+      Gen.oneOf(0, 1, TimestampNanosVal.MAX_NANOS_WITHIN_MICRO),
+      Gen.choose(0, TimestampNanosVal.MAX_NANOS_WITHIN_MICRO)
+    ).map(truncate)
+  }
+
+  // Builds a generator of nanosecond-timestamp literals of the given 
`dataType`, mixing uniform
+  // random values with the precision-truncated `specialNanosTs` edge-case 
corpus. The `special`
+  // values are supplied as already-converted `TimestampNanosVal`s so this 
helper is shared by the
+  // NTZ and LTZ variants, which differ only in the external-to-physical 
conversion and the type.
+  private def nanosLiteralGen(
+      precision: Int,
+      dataType: DataType,
+      special: Seq[TimestampNanosVal]): Gen[Literal] = {
+    val random = for {
+      micros <- microsGen
+      nanos <- nanosWithinMicroGen(precision)
+    } yield TimestampNanosVal.fromParts(micros, nanos.toShort)
+    Gen.oneOf(random, Gen.oneOf(special)).map(Literal.create(_, dataType))
+  }
+
+  def timestampNTZNanosLiteralGen(precision: Int): Gen[Literal] = {
+    val truncate = TimestampNanosTestUtils.nanoOfSecTruncator(precision)
+    val special = TimestampNanosTestUtils.specialNanosTs.map { s =>
+      val ldt = TimestampNanosTestUtils.parseSpecialNanosNTZ(s)
+      
TimestampNanosTestUtils.localDateTimeToNanosVal(ldt.withNano(truncate(ldt.getNano)))
+    }
+    nanosLiteralGen(precision, TimestampNTZNanosType(precision), special)
+  }
+
+  def timestampLTZNanosLiteralGen(precision: Int): Gen[Literal] = {
+    val truncate = TimestampNanosTestUtils.nanoOfSecTruncator(precision)
+    val zoneId = ZoneId.systemDefault()
+    val special = TimestampNanosTestUtils.specialNanosTs.map { s =>
+      val instant = TimestampNanosTestUtils.parseSpecialNanosLTZ(s, zoneId)
+      TimestampNanosTestUtils.instantToNanosVal(
+        Instant.ofEpochSecond(instant.getEpochSecond, 
truncate(instant.getNano).toLong))
+    }
+    nanosLiteralGen(precision, TimestampLTZNanosType(precision), special)
+  }
+
   // Valid range for DateType and TimestampType is [0001-01-01, 9999-12-31]
   private val maxIntervalInMonths: Int = 10000 * 12
 
@@ -208,6 +265,8 @@ object LiteralGenerator {
       case _: TimeType => timeLiteralGen
       case TimestampType => timestampLiteralGen
       case TimestampNTZType => timestampNTZLiteralGen
+      case t: TimestampNTZNanosType => timestampNTZNanosLiteralGen(t.precision)
+      case t: TimestampLTZNanosType => timestampLTZNanosLiteralGen(t.precision)
       case BooleanType => booleanLiteralGen
       case StringType => stringLiteralGen
       case BinaryType => binaryLiteralGen


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-57165][SQL][TEST] Add LiteralGenerator support for nanosecond-capable timestamp types

Reply via email to