[
https://issues.apache.org/jira/browse/SPARK-57838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk updated SPARK-57838:
-----------------------------
Shepherd: Max Gekk
> Harden overflow and calendar-range handling for nanosecond-precision
> timestamps
> -------------------------------------------------------------------------------
>
> Key: SPARK-57838
> URL: https://issues.apache.org/jira/browse/SPARK-57838
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
>
> This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond
> precision).
> h2. Problem
> The SPIP flags overflow/range as a top risk, and the audit confirmed residual
> gaps. {{TimestampNanosVal}} validates {{nanosWithinMicro}} but does not
> normalize carries ({{fromParts}} throws {{INTERNAL_ERROR}} on denormalized
> input). Parse/cast overflow is swallowed as {{None}} or surfaced as
> {{CAST_INVALID_INPUT}} rather than {{DATETIME_FIELD_OUT_OF_BOUNDS}}; there is
> no explicit 0001-9999 validation on cast/parse/interval-add paths;
> {{timestampNanosAddDayTime}} has no overflow wrapper (unlike
> {{timestampAdd}}). A single int64 epoch-nanos cannot represent the full
> 0001-9999 range (~1677-2262 only) - Parquet/Arrow/Avro sinks fail loudly
> (good), but this composite-vs-int64 split is under-documented.
> h2. Goal
> Consistent, well-typed overflow/range behavior with correct error classes,
> plus boundary test coverage.
> h2. Scope
> Add explicit representable-range validation on parse/cast/interval-add
> raising {{DATETIME_FIELD_OUT_OF_BOUNDS}} / {{ARITHMETIC_OVERFLOW}}; audit
> remaining non-exact {{* NANOS_PER_MICROS}} multiplications on unbounded
> {{epochMicros}}; add min ({{0001-01-01T00:00:00.000000000}}) and max
> ({{9999-12-31T23:59:59.999999999}}) boundary tests across
> parse/format/cast/arithmetic; document the int64-epoch-nanos vs {{(micros,
> nanos)}} composite range split for format consumers.
> h2. Acceptance criteria
> * Out-of-range parse/cast raise the datetime bounds error (not generic
> invalid input); arithmetic near the boundaries raises overflow; boundary
> tests pass; no silent wrap.
> h2. Testing
> {{DateTimeUtilsSuite}}, {{CastSuiteBase}}, {{DateExpressionsSuite}},
> {{TimestampNanosParseSuite}}.
> h2. Dependencies
> Cross-cutting - coordinate with the timestampadd/timestampdiff,
> to_timestamp*, sequence, and timestamp-subtraction sub-tasks (the
> arithmetic/parse paths it hardens); no hard blocker.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]