[
https://issues.apache.org/jira/browse/SPARK-57303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk reassigned SPARK-57303:
--------------------------------
Assignee: Max Gekk
> Store-assignment and up-cast rules for nanosecond-precision timestamp types
> ---------------------------------------------------------------------------
>
> Key: SPARK-57303
> URL: https://issues.apache.org/jira/browse/SPARK-57303
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. Background
> Nanosecond-precision timestamp types (TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p), p
> in [7, 9], backed by TimestampNanosVal) now support casting to/from strings
> (SPARK-57211, SPARK-57256) and to/from their microsecond counterparts
> (SPARK-57293). However, there are no store-assignment or up-cast rules
> tailored to these types:
> * They fall through the generic {{(_: DatetimeType, _: DatetimeType)}} arm in
> {{Cast.canANSIStoreAssign}}, so ANSI store assignment would silently truncate
> sub-microsecond digits (handled only narrowly for the micros<->nanos pair in
> SPARK-57293).
> * They are absent from {{UpCastRule.canUpCast}}, so STRICT store assignment
> and up-cast resolution reject even lossless widening.
> h2. Goal
> Define a complete, precision-safe store-assignment / up-cast contract for the
> whole LTZ/NTZ timestamp family across micro and nanosecond precisions:
> * STRICT policy ({{Cast.canUpCast}}): allow lossless widening, reject lossy
> narrowing.
> * ANSI policy ({{Cast.canANSIStoreAssign}}): allow widening, block lossy
> narrowing so it can never silently truncate.
> * LEGACY policy and explicit CAST are unchanged (they still truncate on
> narrowing).
> h2. Rule
> Introduce a single notion of effective fractional-second precision for the
> LTZ/NTZ timestamp family:
> * {{TimestampType}} (LTZ micros) and {{TimestampNTZType}} (NTZ micros) -> 6
> * {{TimestampLTZNanosType(p)}} / {{TimestampNTZNanosType(p)}} -> p (7, 8, or
> 9)
> For any ordered pair of timestamp-family types (including across the LTZ/NTZ
> boundary, which Spark already treats as a mutual up-cast for the micros
> types):
> * target precision >= source precision: lossless widening -> up-cast (STRICT
> and ANSI allowed)
> * target precision < source precision: lossy narrowing -> not an up-cast;
> blocked under ANSI
> This deliberately diverges from the existing TimeType(p) model (which adds no
> widening to canUpCast and allows silent narrowing under ANSI); the divergence
> is the chosen precision-safe behavior.
> h2. Scope
> All 8 LTZ/NTZ timestamp types: TIMESTAMP, TIMESTAMP_NTZ, and TIMESTAMP_LTZ(p)
> / TIMESTAMP_NTZ(p) for p in [7, 9], including cross-family (LTZ <-> NTZ)
> pairs.
> h2. Approach
> * {{UpCastRule.canUpCast}}: add a {{tsFractionalPrecision}} helper and a
> single lossless-widening arm (subsuming the existing TimestampType <->
> TimestampNTZType cases).
> * {{Cast.canANSIStoreAssign}}: generalize the SPARK-57293 narrowing block to
> reject all timestamp-family narrowing via the same precision helper, before
> the generic DatetimeType arm. DATE/TIME and equal-precision LTZ<->NTZ
> conversions are unaffected.
> h2. Dependencies
> The rule layer is intentionally written ahead of the casts. Store assignment
> only succeeds if the inserted Cast resolves, so each allowed pair needs its
> canCast/canAnsiCast arm:
> * Already exist: string <-> nanos; micros <-> nanos same-family (SPARK-57293).
> * Required by separate subtasks before the corresponding rule actually
> permits a write: nanos(p1) <-> nanos(p2) precision change; cross-family
> LTZ(p) <-> NTZ(p) and cross-family micros <-> nanos.
> Until those casts merge, the new entries are dormant for the unimplemented
> pairs (a write fails at cast creation, not at the policy check).
> h2. Out of scope
> * The precision-to-precision and cross-family casts themselves (separate
> subtasks).
> * Implicit type coercion: {{findWiderDateTimeType}} has no arms for the nanos
> types and currently throws a MatchError for nanos datetime pairs (e.g. UNION
> of TIMESTAMP_NTZ and TIMESTAMP_NTZ(9)); tracked as a companion fix.
> * DATE <-> nanos conversions.
> h2. Testing
> * Update the SPARK-57293 store-assignment/up-cast contract test (widening
> canUpCast assertions flip from false to true).
> * Add a full-matrix predicate test over all 8 timestamp types: canUpCast and
> canANSIStoreAssign are true iff target precision >= source precision, false
> otherwise; plus anchors that TIMESTAMP -> DATE stays allowed under ANSI and
> DATE -> TIMESTAMP stays an up-cast.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]