Max Gekk created SPARK-57838:
--------------------------------
Summary: Harden overflow and calendar-range handling for
nanosecond-precision timestamps
Key: SPARK-57838
URL: https://issues.apache.org/jira/browse/SPARK-57838
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond
precision).
h2. Problem
The SPIP flags overflow/range as a top risk, and the audit confirmed residual
gaps. {{TimestampNanosVal}} validates {{nanosWithinMicro}} but does not
normalize carries ({{fromParts}} throws {{INTERNAL_ERROR}} on denormalized
input). Parse/cast overflow is swallowed as {{None}} or surfaced as
{{CAST_INVALID_INPUT}} rather than {{DATETIME_FIELD_OUT_OF_BOUNDS}}; there is
no explicit 0001-9999 validation on cast/parse/interval-add paths;
{{timestampNanosAddDayTime}} has no overflow wrapper (unlike {{timestampAdd}}).
A single int64 epoch-nanos cannot represent the full 0001-9999 range
(~1677-2262 only) - Parquet/Arrow/Avro sinks fail loudly (good), but this
composite-vs-int64 split is under-documented.
h2. Goal
Consistent, well-typed overflow/range behavior with correct error classes, plus
boundary test coverage.
h2. Scope
Add explicit representable-range validation on parse/cast/interval-add raising
{{DATETIME_FIELD_OUT_OF_BOUNDS}} / {{ARITHMETIC_OVERFLOW}}; audit remaining
non-exact {{* NANOS_PER_MICROS}} multiplications on unbounded {{epochMicros}};
add min ({{0001-01-01T00:00:00.000000000}}) and max
({{9999-12-31T23:59:59.999999999}}) boundary tests across
parse/format/cast/arithmetic; document the int64-epoch-nanos vs {{(micros,
nanos)}} composite range split for format consumers.
h2. Acceptance criteria
* Out-of-range parse/cast raise the datetime bounds error (not generic invalid
input); arithmetic near the boundaries raises overflow; boundary tests pass; no
silent wrap.
h2. Testing
{{DateTimeUtilsSuite}}, {{CastSuiteBase}}, {{DateExpressionsSuite}},
{{TimestampNanosParseSuite}}.
h2. Dependencies
Cross-cutting - coordinate with the timestampadd/timestampdiff, to_timestamp*,
sequence, and timestamp-subtraction sub-tasks (the arithmetic/parse paths it
hardens); no hard blocker.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]