Max Gekk created SPARK-57587:
--------------------------------
Summary: Generate TIME values with the declared precision in
LiteralGenerator and RandomDataGenerator
Key: SPARK-57587
URL: https://issues.apache.org/jira/browse/SPARK-57587
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. What
{{LiteralGenerator}} and {{RandomDataGenerator}} generate {{TIME}} values
without honoring the declared {{TimeType}} precision:
* {{LiteralGenerator.timeLiteralGen}} always returns a literal of the default
type {{TimeType()}} (precision 6) and does not truncate the value to the
requested precision. {{randomGen}} dispatches with {{case _: TimeType =>
timeLiteralGen}}, so {{randomGen(TimeType(p))}} yields a {{TimeType(6)}}
literal regardless of {{p}}. The literal's declared type does not match the
requested type, and the value can even carry sub-microsecond digits because the
range is computed from microseconds while {{Gen.choose}} draws arbitrary
nanoseconds.
* {{RandomDataGenerator.forType}} matches {{case _: TimeType =>}} and always
produces microsecond-granular {{LocalTime}} values (and non-conforming special
values such as {{23:59:59.999999}}) regardless of the precision.
As a result, the precision dimension of {{DataTypeTestUtils.timeTypes}} is
never actually exercised: iterating over {{TimeType(MIN_PRECISION)}} and
{{TimeType(MAX_PRECISION)}} produces effectively identical data. The suites
that loop over {{ordered}} / {{atomicTypes}} / {{propertyCheckSupported}} (e.g.
{{PredicateSuite}}, {{ConditionalExpressionSuite}},
{{ArithmeticExpressionSuite}}, {{OrderingSuite}}, {{SortSuite}}, {{CastSuite}},
{{RandomDataGeneratorSuite}}) silently test a single precision.
h2. Why are the changes needed?
This is the precision-conformance follow-up to SPARK-51403 (TIME as
ordered/atomic type) and SPARK-51669 (random TIME values in tests). SPARK-57551
raises {{TimeType.MAX_PRECISION}} from 6 to 9, which widens the gap: the
generators must cover {{TIME(0)}} .. {{TIME(9)}} and produce values whose
fractional seconds match the declared precision.
h2. Proposed changes (test-only)
* {{LiteralGenerator}}: replace the {{timeLiteralGen}} {{val}} with a
precision-aware {{def timeLiteralGen(timeType: TimeType)}} that draws
nanoseconds over the full day range {{[0, NANOS_PER_DAY)}}, truncates with
{{DateTimeUtils.truncateTimeToPrecision(nanos, timeType.precision)}}, and
yields {{Literal(value, timeType)}}; dispatch via {{case t: TimeType =>
timeLiteralGen(t)}}.
* {{RandomDataGenerator}}: match {{case t: TimeType =>}} and truncate both the
random draw and the special values to {{t.precision}} before
{{nanosToLocalTime}}, mirroring the nanosecond-timestamp generators.
* Optionally broaden {{DataTypeTestUtils.timeTypes}} with an intermediate
precision (e.g. {{TimeType(3)}}) so the loops exercise a value between the
{{[0, 9]}} endpoints.
No production code or versioning changes; this only fixes test data generation.
h2. Does this PR introduce _any_ user-facing change?
No.
h2. How was this patch tested?
By running the affected TIME-covering suites: {{LiteralGeneratorSuite}},
{{RandomDataGeneratorSuite}}, {{PredicateSuite}},
{{ConditionalExpressionSuite}}, {{ArithmeticExpressionSuite}},
{{OrderingSuite}}, {{CastSuite}}/{{CastWithAnsiOnSuite}}, {{UnsafeRowSuite}},
and {{SortSuite}}, and verifying generated {{TIME(p)}} data has at most {{p}}
fractional-second digits.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]