Max Gekk created SPARK-57587:
--------------------------------

             Summary: Generate TIME values with the declared precision in 
LiteralGenerator and RandomDataGenerator
                 Key: SPARK-57587
                 URL: https://issues.apache.org/jira/browse/SPARK-57587
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. What

{{LiteralGenerator}} and {{RandomDataGenerator}} generate {{TIME}} values 
without honoring the declared {{TimeType}} precision:

* {{LiteralGenerator.timeLiteralGen}} always returns a literal of the default 
type {{TimeType()}} (precision 6) and does not truncate the value to the 
requested precision. {{randomGen}} dispatches with {{case _: TimeType => 
timeLiteralGen}}, so {{randomGen(TimeType(p))}} yields a {{TimeType(6)}} 
literal regardless of {{p}}. The literal's declared type does not match the 
requested type, and the value can even carry sub-microsecond digits because the 
range is computed from microseconds while {{Gen.choose}} draws arbitrary 
nanoseconds.
* {{RandomDataGenerator.forType}} matches {{case _: TimeType =>}} and always 
produces microsecond-granular {{LocalTime}} values (and non-conforming special 
values such as {{23:59:59.999999}}) regardless of the precision.

As a result, the precision dimension of {{DataTypeTestUtils.timeTypes}} is 
never actually exercised: iterating over {{TimeType(MIN_PRECISION)}} and 
{{TimeType(MAX_PRECISION)}} produces effectively identical data. The suites 
that loop over {{ordered}} / {{atomicTypes}} / {{propertyCheckSupported}} (e.g. 
{{PredicateSuite}}, {{ConditionalExpressionSuite}}, 
{{ArithmeticExpressionSuite}}, {{OrderingSuite}}, {{SortSuite}}, {{CastSuite}}, 
{{RandomDataGeneratorSuite}}) silently test a single precision.

h2. Why are the changes needed?

This is the precision-conformance follow-up to SPARK-51403 (TIME as 
ordered/atomic type) and SPARK-51669 (random TIME values in tests). SPARK-57551 
raises {{TimeType.MAX_PRECISION}} from 6 to 9, which widens the gap: the 
generators must cover {{TIME(0)}} .. {{TIME(9)}} and produce values whose 
fractional seconds match the declared precision.

h2. Proposed changes (test-only)

* {{LiteralGenerator}}: replace the {{timeLiteralGen}} {{val}} with a 
precision-aware {{def timeLiteralGen(timeType: TimeType)}} that draws 
nanoseconds over the full day range {{[0, NANOS_PER_DAY)}}, truncates with 
{{DateTimeUtils.truncateTimeToPrecision(nanos, timeType.precision)}}, and 
yields {{Literal(value, timeType)}}; dispatch via {{case t: TimeType => 
timeLiteralGen(t)}}.
* {{RandomDataGenerator}}: match {{case t: TimeType =>}} and truncate both the 
random draw and the special values to {{t.precision}} before 
{{nanosToLocalTime}}, mirroring the nanosecond-timestamp generators.
* Optionally broaden {{DataTypeTestUtils.timeTypes}} with an intermediate 
precision (e.g. {{TimeType(3)}}) so the loops exercise a value between the 
{{[0, 9]}} endpoints.

No production code or versioning changes; this only fixes test data generation.

h2. Does this PR introduce _any_ user-facing change?

No.

h2. How was this patch tested?

By running the affected TIME-covering suites: {{LiteralGeneratorSuite}}, 
{{RandomDataGeneratorSuite}}, {{PredicateSuite}}, 
{{ConditionalExpressionSuite}}, {{ArithmeticExpressionSuite}}, 
{{OrderingSuite}}, {{CastSuite}}/{{CastWithAnsiOnSuite}}, {{UnsafeRowSuite}}, 
and {{SortSuite}}, and verifying generated {{TIME(p)}} data has at most {{p}} 
fractional-second digits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to