Max Gekk created SPARK-57303:
--------------------------------
Summary: Store-assignment and up-cast rules for
nanosecond-precision timestamp types
Key: SPARK-57303
URL: https://issues.apache.org/jira/browse/SPARK-57303
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. Background
Nanosecond-precision timestamp types (TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p), p in
[7, 9], backed by TimestampNanosVal) now support casting to/from strings
(SPARK-57211, SPARK-57256) and to/from their microsecond counterparts
(SPARK-57293). However, there are no store-assignment or up-cast rules tailored
to these types:
* They fall through the generic {{(_: DatetimeType, _: DatetimeType)}} arm in
{{Cast.canANSIStoreAssign}}, so ANSI store assignment would silently truncate
sub-microsecond digits (handled only narrowly for the micros<->nanos pair in
SPARK-57293).
* They are absent from {{UpCastRule.canUpCast}}, so STRICT store assignment and
up-cast resolution reject even lossless widening.
h2. Goal
Define a complete, precision-safe store-assignment / up-cast contract for the
whole LTZ/NTZ timestamp family across micro and nanosecond precisions:
* STRICT policy ({{Cast.canUpCast}}): allow lossless widening, reject lossy
narrowing.
* ANSI policy ({{Cast.canANSIStoreAssign}}): allow widening, block lossy
narrowing so it can never silently truncate.
* LEGACY policy and explicit CAST are unchanged (they still truncate on
narrowing).
h2. Rule
Introduce a single notion of effective fractional-second precision for the
LTZ/NTZ timestamp family:
* {{TimestampType}} (LTZ micros) and {{TimestampNTZType}} (NTZ micros) -> 6
* {{TimestampLTZNanosType(p)}} / {{TimestampNTZNanosType(p)}} -> p (7, 8, or 9)
For any ordered pair of timestamp-family types (including across the LTZ/NTZ
boundary, which Spark already treats as a mutual up-cast for the micros types):
* target precision >= source precision: lossless widening -> up-cast (STRICT
and ANSI allowed)
* target precision < source precision: lossy narrowing -> not an up-cast;
blocked under ANSI
This deliberately diverges from the existing TimeType(p) model (which adds no
widening to canUpCast and allows silent narrowing under ANSI); the divergence
is the chosen precision-safe behavior.
h2. Scope
All 8 LTZ/NTZ timestamp types: TIMESTAMP, TIMESTAMP_NTZ, and TIMESTAMP_LTZ(p) /
TIMESTAMP_NTZ(p) for p in [7, 9], including cross-family (LTZ <-> NTZ) pairs.
h2. Approach
* {{UpCastRule.canUpCast}}: add a {{tsFractionalPrecision}} helper and a single
lossless-widening arm (subsuming the existing TimestampType <->
TimestampNTZType cases).
* {{Cast.canANSIStoreAssign}}: generalize the SPARK-57293 narrowing block to
reject all timestamp-family narrowing via the same precision helper, before the
generic DatetimeType arm. DATE/TIME and equal-precision LTZ<->NTZ conversions
are unaffected.
h2. Dependencies
The rule layer is intentionally written ahead of the casts. Store assignment
only succeeds if the inserted Cast resolves, so each allowed pair needs its
canCast/canAnsiCast arm:
* Already exist: string <-> nanos; micros <-> nanos same-family (SPARK-57293).
* Required by separate subtasks before the corresponding rule actually permits
a write: nanos(p1) <-> nanos(p2) precision change; cross-family LTZ(p) <->
NTZ(p) and cross-family micros <-> nanos.
Until those casts merge, the new entries are dormant for the unimplemented
pairs (a write fails at cast creation, not at the policy check).
h2. Out of scope
* The precision-to-precision and cross-family casts themselves (separate
subtasks).
* Implicit type coercion: {{findWiderDateTimeType}} has no arms for the nanos
types and currently throws a MatchError for nanos datetime pairs (e.g. UNION of
TIMESTAMP_NTZ and TIMESTAMP_NTZ(9)); tracked as a companion fix.
* DATE <-> nanos conversions.
h2. Testing
* Update the SPARK-57293 store-assignment/up-cast contract test (widening
canUpCast assertions flip from false to true).
* Add a full-matrix predicate test over all 8 timestamp types: canUpCast and
canANSIStoreAssign are true iff target precision >= source precision, false
otherwise; plus anchors that TIMESTAMP -> DATE stays allowed under ANSI and
DATE -> TIMESTAMP stays an up-cast.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]