[ 
https://issues.apache.org/jira/browse/SPARK-57293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57293:
-----------------------------
    Affects Version/s: 4.3.0
                           (was: 5.0.0)

> Cast between nanosecond-precision and microsecond-precision timestamp types
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-57293
>                 URL: https://issues.apache.org/jira/browse/SPARK-57293
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>         Attachments: SPARK-57293-plan.md
>
>
> h3. Background
> Nanosecond-precision timestamp types ({{TIMESTAMP_NTZ(p)}} / 
> {{TIMESTAMP_LTZ(p)}}, with {{p}} in [7, 9], backed by {{TimestampNanosVal}}) 
> currently support parsing from strings (SPARK-57211) and rendering to strings 
> (SPARK-57256). There is no cast between a nanosecond-precision type and its 
> microsecond-precision counterpart, so values cannot move between 
> {{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_NTZ}}, or between {{TIMESTAMP_LTZ(p)}} 
> and {{TIMESTAMP_LTZ}}.
> h3. Goal
> Support explicit casts for the four pairs, in both the interpreted and 
> codegen paths:
> * {{TIMESTAMP_NTZ}} -> {{TIMESTAMP_NTZ(p)}} and back
> * {{TIMESTAMP_LTZ}} -> {{TIMESTAMP_LTZ(p)}} and back
> h3. Semantics
> Both directions stay within a single zone family, so they are pure 
> representation conversions with no timezone involvement:
> * Widening (micros -> nanos): {{nanosWithinMicro}} is set to 0; lossless and 
> independent of the target precision {{p}}.
> * Narrowing (nanos -> micros): take {{epochMicros}}, dropping the 
> sub-microsecond digits. Truncation toward the past (floor), consistent with 
> how microsecond timestamps are already produced; silent in both ANSI and 
> non-ANSI modes (matching Spark's existing silent fractional-second truncation 
> for timestamp casts).
> h3. Approach
> Wire the four pairs in {{Cast}}: register them in 
> {{canCast}}/{{canAnsiCast}}, add interpreted cases to 
> {{castToTimestamp}}/{{castToTimestampNTZ}}/{{castToTimestampLTZNanos}}/{{castToTimestampNTZNanos}},
>  and mirror them in the corresponding codegen helpers. No new 
> {{Cast.needsTimeZone}} entries are required. The preview flag 
> {{spark.sql.timestampNanosTypes.enabled}} continues to gate the 
> nanosecond-typed side.
> h3. Out of scope
> * Precision-to-precision casts within the nanosecond family 
> ({{TIMESTAMP_NTZ(p1)}} -> {{TIMESTAMP_NTZ(p2)}}).
> * Cross-family casts ({{TIMESTAMP_LTZ(p)}} <-> {{TIMESTAMP_NTZ(p)}}), which 
> would require timezone handling.
> * Implicit/up-cast and store-assignment coercion; these casts remain 
> explicit-only, consistent with the existing string<->nanos casts.
> h3. Testing
> Add coverage in {{CastSuiteBase}} (widening, narrowing/truncation, 
> round-trip, null) exercised by both ANSI-on/off and interpreted/codegen 
> variants; optional end-to-end golden coverage in {{cast.sql}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to