Max Gekk created SPARK-57034:
--------------------------------

             Summary: Add TimestampNanosTestUtils and RandomDataGenerator 
support for nanosecond timestamps
                 Key: SPARK-57034
                 URL: https://issues.apache.org/jira/browse/SPARK-57034
             Project: Spark
          Issue Type: Sub-task
          Components: SQL, Tests
    Affects Versions: 4.2.0
            Reporter: Max Gekk


h2. Summary

Introduce shared *test infrastructure* for nanosecond-capable timestamps: a 
{{TimestampNanosTestUtils}} helper (parallel to {{DateTimeTestUtils}}) and 
{{RandomDataGenerator}} support for {{TimestampNTZNanosType(p)}} / 
{{TimestampLTZNanosType(p)}} with a fixed edge-case corpus and seeded random 
values at *nanosecond* precision.

The deliverable is *test code only* — no user-facing API.

h2. Background

Spark datetime tests rely on shared helpers today:

* {{DateTimeTestUtils}} — fixed {{LocalDateTime}} / micro {{Long}} builders, 
time zones, Julian/Gregorian edge handling
* {{RandomDataGenerator.forType}} — random values per {{DataType}}; 
{{TimestampType}} → {{Instant}}, {{TimestampNTZType}} → {{LocalDateTime}} via 
{{uniformMicrosRand}} (micro precision only)
* {{specialTs}} corpus in {{RandomDataGenerator}} and {{CastSuiteBase}} — 
epoch, 1582 cutover, 0001, 9999 (no sub-micro fractional digits)

Nanosecond row/unsafe tests 
([SPARK-56981|https://github.com/apache/spark/pull/56059]) use hand-written 
{{TimestampNTZNanos(epochMicros, nanos)}} literals. Downcoming work (casts, 
coercion, hash, Parquet, benchmarks, expression parity) needs reusable fixed 
values, random generators, and {{java.time}}-based oracles — without 
duplicating boilerplate in every suite.

Sub-task *1b* provides {{java.time}} ↔ composite conversion; this ticket 
consumes those helpers in test utilities and generators.

h2. Scope

h3. 1. {{TimestampNanosTestUtils}} (new, {{sql/catalyst/src/test/.../util/}})

Add an object modeled on {{DateTimeTestUtils}}:

* Readable fixed-value builders, e.g. {{timestampNTZ(year, month, day, hour, 
minute, sec, nanosWithinMicro)}} returning {{TimestampNTZNanos}}; LTZ variant 
with {{ZoneId}} where needed
* Convenience wrappers producing {{LocalDateTime}} / {{Instant}} for the same 
instants (delegate to *1b* conversion helpers)
* Shared constants: default test zone IDs (reuse {{DateTimeTestUtils.UTC}}, 
{{PST}}, etc.)
* Optional {{gridTest}} / precision loop helpers for *p* in \[7, 9\] (mirror 
patterns in existing datetime suites)

h3. 2. Edge-case corpus ({{specialNanosTs}})

Define a shared sequence of nanosecond timestamp strings and/or {{java.time}} 
values, extending the micro {{specialTs}} set with 7–9 fractional digits, e.g.:

* {{1970-01-01 00:00:00.000000001}} ({{nanosWithinMicro = 1}})
* {{1582-10-15 23:59:59.123456789}}
* {{9999-12-31 23:59:59.999999999}} ({{nanosWithinMicro = 999}})
* Existing corpus dates (0001, epoch, 9999) with {{nanosWithinMicro}} in {0, 1, 
999}

Expose from {{TimestampNanosTestUtils}} for reuse in {{CastSuiteBase}}, Parquet 
fixtures, and benchmarks.

h3. 3. Extend {{RandomDataGenerator.forType}}

Add cases for {{TimestampNTZNanosType(_)}} and {{TimestampLTZNanosType(_)}}:

* *Uniform random:* {{uniformMicrosRand}} for {{epochMicros}} + 
{{rand.nextInt(1000)}} for {{nanosWithinMicro}} (always normalized)
* *External representation:* return {{LocalDateTime}} (NTZ) / {{Instant}} (LTZ) 
— same convention as micro timestamp generators; not raw {{TimestampNTZNanos}} 
pairs in caller code
* *Special values:* mix in {{specialNanosTs}} corpus entries
* *{{validJulianDatetime}}:* reuse existing flag and Proleptic-Gregorian shift 
logic from micro generator
* *Nullable:* honor {{nullable}} parameter (null fraction)

h3. 4. Unit tests for the infrastructure itself

New suite (e.g. {{TimestampNanosTestUtilsSuite}}):

* Fixed builders produce normalized values ({{nanosWithinMicro}} in \[0, 999\])
* {{RandomDataGenerator.forType(TimestampNTZNanosType(9))}} returns non-null 
{{LocalDateTime}} with varying nano-of-second
* Seeded roundtrip smoke test: {{Random(42)}}, e.g. 1000 iterations — generate 
→ convert to composite → convert back → {{equals}} on {{java.time}} (uses *1b* 
helpers)
* {{specialNanosTs}} entries are parseable / convertible without exception

h2. Out of scope

* Production conversion or parsing logic (sub-tasks *1a*, *1b*, *1c*, *1d*)
* {{CatalystTypeConverters}} / Dataset encoder wiring (*1b*)
* Cast, hash, ordering, Parquet implementation tests (those suites *consume* 
this infra in later tickets)
* SQL golden files ({{SQLQueryTestSuite}})
* Benchmark classes (may reuse {{RandomDataGenerator}} / {{specialNanosTs}} 
after this lands)
* ScalaCheck / property-based framework introduction

h2. Implementation notes

* Keep all new code under {{src/test}} — no production dependency from main 
sources on test utils.
* Prefer {{LocalDateTime}} / {{Instant}} as the external type in 
{{RandomDataGenerator}} to match micro timestamp conventions and *1b* 
converters.
* Do not change behavior of existing {{RandomDataGenerator}} cases for 
{{TimestampType}} / {{TimestampNTZType}}.
* Consider a single shared {{specialNanosTs}} object referenced from 
{{RandomDataGenerator}} and optionally from {{CastSuiteBase}} in a follow-up 
(avoid large unrelated refactors in this ticket; exporting from 
{{TimestampNanosTestUtils}} is sufficient).

h2. Acceptance criteria

* {{TimestampNanosTestUtils}} provides fixed builders and {{specialNanosTs}} 
corpus usable from other test suites.
* {{RandomDataGenerator.forType(TimestampNTZNanosType(9))}} and 
{{TimestampLTZNanosType(9)}} return {{Some(generator)}} producing 
{{LocalDateTime}} / {{Instant}} with nanosecond variation.
* {{TimestampNanosTestUtilsSuite}} (or equivalent) passes; seeded random 
roundtrip smoke test passes.
* Existing {{RandomDataGenerator}} and datetime test suites show no regressions.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to