[
https://issues.apache.org/jira/browse/SPARK-57587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk reassigned SPARK-57587:
--------------------------------
Assignee: Max Gekk
> Generate TIME values with the declared precision in LiteralGenerator and
> RandomDataGenerator
> --------------------------------------------------------------------------------------------
>
> Key: SPARK-57587
> URL: https://issues.apache.org/jira/browse/SPARK-57587
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. What
> {{LiteralGenerator}} and {{RandomDataGenerator}} generate {{TIME}} values
> without honoring the declared {{TimeType}} precision:
> * {{LiteralGenerator.timeLiteralGen}} always returns a literal of the default
> type {{TimeType()}} (precision 6) and does not truncate the value to the
> requested precision. {{randomGen}} dispatches with {{case _: TimeType =>
> timeLiteralGen}}, so {{randomGen(TimeType(p))}} yields a {{TimeType(6)}}
> literal regardless of {{p}}. The literal's declared type does not match the
> requested type, and the value can even carry sub-microsecond digits because
> the range is computed from microseconds while {{Gen.choose}} draws arbitrary
> nanoseconds.
> * {{RandomDataGenerator.forType}} matches {{case _: TimeType =>}} and always
> produces microsecond-granular {{LocalTime}} values (and non-conforming
> special values such as {{23:59:59.999999}}) regardless of the precision.
> As a result, the precision dimension of {{DataTypeTestUtils.timeTypes}} is
> never actually exercised: iterating over {{TimeType(MIN_PRECISION)}} and
> {{TimeType(MAX_PRECISION)}} produces effectively identical data. The suites
> that loop over {{ordered}} / {{atomicTypes}} / {{propertyCheckSupported}}
> (e.g. {{PredicateSuite}}, {{ConditionalExpressionSuite}},
> {{ArithmeticExpressionSuite}}, {{OrderingSuite}}, {{SortSuite}},
> {{CastSuite}}, {{RandomDataGeneratorSuite}}) silently test a single precision.
> h2. Why are the changes needed?
> This is the precision-conformance follow-up to SPARK-51403 (TIME as
> ordered/atomic type) and SPARK-51669 (random TIME values in tests).
> SPARK-57551 raises {{TimeType.MAX_PRECISION}} from 6 to 9, which widens the
> gap: the generators must cover {{TIME(0)}} .. {{TIME(9)}} and produce values
> whose fractional seconds match the declared precision.
> h2. Proposed changes (test-only)
> * {{LiteralGenerator}}: replace the {{timeLiteralGen}} {{val}} with a
> precision-aware {{def timeLiteralGen(timeType: TimeType)}} that draws
> nanoseconds over the full day range {{[0, NANOS_PER_DAY)}}, truncates with
> {{DateTimeUtils.truncateTimeToPrecision(nanos, timeType.precision)}}, and
> yields {{Literal(value, timeType)}}; dispatch via {{case t: TimeType =>
> timeLiteralGen(t)}}.
> * {{RandomDataGenerator}}: match {{case t: TimeType =>}} and truncate both
> the random draw and the special values to {{t.precision}} before
> {{nanosToLocalTime}}, mirroring the nanosecond-timestamp generators.
> * Optionally broaden {{DataTypeTestUtils.timeTypes}} with an intermediate
> precision (e.g. {{TimeType(3)}}) so the loops exercise a value between the
> {{[0, 9]}} endpoints.
> No production code or versioning changes; this only fixes test data
> generation.
> h2. Does this PR introduce _any_ user-facing change?
> No.
> h2. How was this patch tested?
> By running the affected TIME-covering suites: {{LiteralGeneratorSuite}},
> {{RandomDataGeneratorSuite}}, {{PredicateSuite}},
> {{ConditionalExpressionSuite}}, {{ArithmeticExpressionSuite}},
> {{OrderingSuite}}, {{CastSuite}}/{{CastWithAnsiOnSuite}}, {{UnsafeRowSuite}},
> and {{SortSuite}}, and verifying generated {{TIME(p)}} data has at most {{p}}
> fractional-second digits.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]