[
https://issues.apache.org/jira/browse/SPARK-57293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk updated SPARK-57293:
-----------------------------
Affects Version/s: 4.3.0
(was: 5.0.0)
> Cast between nanosecond-precision and microsecond-precision timestamp types
> ---------------------------------------------------------------------------
>
> Key: SPARK-57293
> URL: https://issues.apache.org/jira/browse/SPARK-57293
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
> Attachments: SPARK-57293-plan.md
>
>
> h3. Background
> Nanosecond-precision timestamp types ({{TIMESTAMP_NTZ(p)}} /
> {{TIMESTAMP_LTZ(p)}}, with {{p}} in [7, 9], backed by {{TimestampNanosVal}})
> currently support parsing from strings (SPARK-57211) and rendering to strings
> (SPARK-57256). There is no cast between a nanosecond-precision type and its
> microsecond-precision counterpart, so values cannot move between
> {{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_NTZ}}, or between {{TIMESTAMP_LTZ(p)}}
> and {{TIMESTAMP_LTZ}}.
> h3. Goal
> Support explicit casts for the four pairs, in both the interpreted and
> codegen paths:
> * {{TIMESTAMP_NTZ}} -> {{TIMESTAMP_NTZ(p)}} and back
> * {{TIMESTAMP_LTZ}} -> {{TIMESTAMP_LTZ(p)}} and back
> h3. Semantics
> Both directions stay within a single zone family, so they are pure
> representation conversions with no timezone involvement:
> * Widening (micros -> nanos): {{nanosWithinMicro}} is set to 0; lossless and
> independent of the target precision {{p}}.
> * Narrowing (nanos -> micros): take {{epochMicros}}, dropping the
> sub-microsecond digits. Truncation toward the past (floor), consistent with
> how microsecond timestamps are already produced; silent in both ANSI and
> non-ANSI modes (matching Spark's existing silent fractional-second truncation
> for timestamp casts).
> h3. Approach
> Wire the four pairs in {{Cast}}: register them in
> {{canCast}}/{{canAnsiCast}}, add interpreted cases to
> {{castToTimestamp}}/{{castToTimestampNTZ}}/{{castToTimestampLTZNanos}}/{{castToTimestampNTZNanos}},
> and mirror them in the corresponding codegen helpers. No new
> {{Cast.needsTimeZone}} entries are required. The preview flag
> {{spark.sql.timestampNanosTypes.enabled}} continues to gate the
> nanosecond-typed side.
> h3. Out of scope
> * Precision-to-precision casts within the nanosecond family
> ({{TIMESTAMP_NTZ(p1)}} -> {{TIMESTAMP_NTZ(p2)}}).
> * Cross-family casts ({{TIMESTAMP_LTZ(p)}} <-> {{TIMESTAMP_NTZ(p)}}), which
> would require timezone handling.
> * Implicit/up-cast and store-assignment coercion; these casts remain
> explicit-only, consistent with the existing string<->nanos casts.
> h3. Testing
> Add coverage in {{CastSuiteBase}} (widening, narrowing/truncation,
> round-trip, null) exercised by both ANSI-on/off and interpreted/codegen
> variants; optional end-to-end golden coverage in {{cast.sql}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]