[ 
https://issues.apache.org/jira/browse/SPARK-57838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57838:
-----------------------------
    Shepherd: Max Gekk

> Harden overflow and calendar-range handling for nanosecond-precision 
> timestamps
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-57838
>                 URL: https://issues.apache.org/jira/browse/SPARK-57838
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond 
> precision).
> h2. Problem
> The SPIP flags overflow/range as a top risk, and the audit confirmed residual 
> gaps. {{TimestampNanosVal}} validates {{nanosWithinMicro}} but does not 
> normalize carries ({{fromParts}} throws {{INTERNAL_ERROR}} on denormalized 
> input). Parse/cast overflow is swallowed as {{None}} or surfaced as 
> {{CAST_INVALID_INPUT}} rather than {{DATETIME_FIELD_OUT_OF_BOUNDS}}; there is 
> no explicit 0001-9999 validation on cast/parse/interval-add paths; 
> {{timestampNanosAddDayTime}} has no overflow wrapper (unlike 
> {{timestampAdd}}). A single int64 epoch-nanos cannot represent the full 
> 0001-9999 range (~1677-2262 only) - Parquet/Arrow/Avro sinks fail loudly 
> (good), but this composite-vs-int64 split is under-documented.
> h2. Goal
> Consistent, well-typed overflow/range behavior with correct error classes, 
> plus boundary test coverage.
> h2. Scope
> Add explicit representable-range validation on parse/cast/interval-add 
> raising {{DATETIME_FIELD_OUT_OF_BOUNDS}} / {{ARITHMETIC_OVERFLOW}}; audit 
> remaining non-exact {{* NANOS_PER_MICROS}} multiplications on unbounded 
> {{epochMicros}}; add min ({{0001-01-01T00:00:00.000000000}}) and max 
> ({{9999-12-31T23:59:59.999999999}}) boundary tests across 
> parse/format/cast/arithmetic; document the int64-epoch-nanos vs {{(micros, 
> nanos)}} composite range split for format consumers.
> h2. Acceptance criteria
> * Out-of-range parse/cast raise the datetime bounds error (not generic 
> invalid input); arithmetic near the boundaries raises overflow; boundary 
> tests pass; no silent wrap.
> h2. Testing
> {{DateTimeUtilsSuite}}, {{CastSuiteBase}}, {{DateExpressionsSuite}}, 
> {{TimestampNanosParseSuite}}.
> h2. Dependencies
> Cross-cutting - coordinate with the timestampadd/timestampdiff, 
> to_timestamp*, sequence, and timestamp-subtraction sub-tasks (the 
> arithmetic/parse paths it hardens); no hard blocker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to