[ 
https://issues.apache.org/jira/browse/SPARK-57587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-57587:
--------------------------------

    Assignee: Max Gekk

> Generate TIME values with the declared precision in LiteralGenerator and 
> RandomDataGenerator
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57587
>                 URL: https://issues.apache.org/jira/browse/SPARK-57587
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>
> h2. What
> {{LiteralGenerator}} and {{RandomDataGenerator}} generate {{TIME}} values 
> without honoring the declared {{TimeType}} precision:
> * {{LiteralGenerator.timeLiteralGen}} always returns a literal of the default 
> type {{TimeType()}} (precision 6) and does not truncate the value to the 
> requested precision. {{randomGen}} dispatches with {{case _: TimeType => 
> timeLiteralGen}}, so {{randomGen(TimeType(p))}} yields a {{TimeType(6)}} 
> literal regardless of {{p}}. The literal's declared type does not match the 
> requested type, and the value can even carry sub-microsecond digits because 
> the range is computed from microseconds while {{Gen.choose}} draws arbitrary 
> nanoseconds.
> * {{RandomDataGenerator.forType}} matches {{case _: TimeType =>}} and always 
> produces microsecond-granular {{LocalTime}} values (and non-conforming 
> special values such as {{23:59:59.999999}}) regardless of the precision.
> As a result, the precision dimension of {{DataTypeTestUtils.timeTypes}} is 
> never actually exercised: iterating over {{TimeType(MIN_PRECISION)}} and 
> {{TimeType(MAX_PRECISION)}} produces effectively identical data. The suites 
> that loop over {{ordered}} / {{atomicTypes}} / {{propertyCheckSupported}} 
> (e.g. {{PredicateSuite}}, {{ConditionalExpressionSuite}}, 
> {{ArithmeticExpressionSuite}}, {{OrderingSuite}}, {{SortSuite}}, 
> {{CastSuite}}, {{RandomDataGeneratorSuite}}) silently test a single precision.
> h2. Why are the changes needed?
> This is the precision-conformance follow-up to SPARK-51403 (TIME as 
> ordered/atomic type) and SPARK-51669 (random TIME values in tests). 
> SPARK-57551 raises {{TimeType.MAX_PRECISION}} from 6 to 9, which widens the 
> gap: the generators must cover {{TIME(0)}} .. {{TIME(9)}} and produce values 
> whose fractional seconds match the declared precision.
> h2. Proposed changes (test-only)
> * {{LiteralGenerator}}: replace the {{timeLiteralGen}} {{val}} with a 
> precision-aware {{def timeLiteralGen(timeType: TimeType)}} that draws 
> nanoseconds over the full day range {{[0, NANOS_PER_DAY)}}, truncates with 
> {{DateTimeUtils.truncateTimeToPrecision(nanos, timeType.precision)}}, and 
> yields {{Literal(value, timeType)}}; dispatch via {{case t: TimeType => 
> timeLiteralGen(t)}}.
> * {{RandomDataGenerator}}: match {{case t: TimeType =>}} and truncate both 
> the random draw and the special values to {{t.precision}} before 
> {{nanosToLocalTime}}, mirroring the nanosecond-timestamp generators.
> * Optionally broaden {{DataTypeTestUtils.timeTypes}} with an intermediate 
> precision (e.g. {{TimeType(3)}}) so the loops exercise a value between the 
> {{[0, 9]}} endpoints.
> No production code or versioning changes; this only fixes test data 
> generation.
> h2. Does this PR introduce _any_ user-facing change?
> No.
> h2. How was this patch tested?
> By running the affected TIME-covering suites: {{LiteralGeneratorSuite}}, 
> {{RandomDataGeneratorSuite}}, {{PredicateSuite}}, 
> {{ConditionalExpressionSuite}}, {{ArithmeticExpressionSuite}}, 
> {{OrderingSuite}}, {{CastSuite}}/{{CastWithAnsiOnSuite}}, {{UnsafeRowSuite}}, 
> and {{SortSuite}}, and verifying generated {{TIME(p)}} data has at most {{p}} 
> fractional-second digits.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to