[
https://issues.apache.org/jira/browse/SPARK-57511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk updated SPARK-57511:
-----------------------------
Description:
h3. Problem
Spark currently supports:
* {{TIMESTAMP_NTZ(p)}} -> {{TIMESTAMP_NTZ(q)}} and {{TIMESTAMP_LTZ(p)}} ->
{{TIMESTAMP_LTZ(q)}} casts
* microsecond family {{TIMESTAMP}} <-> {{TIMESTAMP_NTZ}} casts
But direct cross-family nanos casts between {{TIMESTAMP_LTZ(p)}} and
{{TIMESTAMP_NTZ(q)}} are not fully supported for precision parameters in {{[6,
9]}}. This creates a gap in timestamp cast parity for explicit {{CAST(...)}}
operations.
h3. Proposal
Implement explicit cast support for:
* {{CAST(<timestamp_ltz(p)> AS TIMESTAMP_NTZ(q))}}
* {{CAST(<timestamp_ntz(p)> AS TIMESTAMP_LTZ(q))}}
where {{p, q in [6, 9]}}.
Semantics:
* Follow existing LTZ<->NTZ timezone conversion behavior.
* Follow existing precision narrowing behavior ({{q < p}}): truncate/floor
fractional precision consistently with current cast semantics, including
pre-epoch values.
* Keep {{p=6}} behavior aligned with the existing mapping to microsecond
timestamp types.
h3. Scope
In scope:
* Explicit {{CAST}} only.
* Analyzer cast admissibility updates.
* Runtime implementation (interpreted + codegen).
* Catalyst and SQL golden test coverage for representative {{p,q}} combinations
in {{[6,9]}}.
Out of scope:
* Implicit type coercion / common-type resolution changes (e.g. {{CASE}},
{{UNION}} wider-type inference).
h3. Testing
* Add/extend catalyst cast tests for LTZ(p) <-> NTZ(q), including:
** boundary precisions ({{6}}, {{9}})
** narrowing precision cases
** pre-epoch and timezone-sensitive cases
** interpreted vs codegen parity
* Extend SQL cast golden tests and regenerate expected outputs as needed.
h3. User impact
User-facing improvement: explicit casts between {{TIMESTAMP_LTZ(p)}} and
{{TIMESTAMP_NTZ(q)}} become supported consistently for {{p,q in [6,9]}}.
h3. Risk
Low to medium:
* Touches cast type-check and conversion paths for nanos timestamp types.
* Mitigated by focused unit + SQL golden coverage.
was:
### Problem
Spark currently supports:
- `TIMESTAMP_NTZ(p)` -> `TIMESTAMP_NTZ(q)` and `TIMESTAMP_LTZ(p)` ->
`TIMESTAMP_LTZ(q)` casts
- microsecond family `TIMESTAMP` <-> `TIMESTAMP_NTZ` casts
But direct cross-family nanos casts between `TIMESTAMP_LTZ(p)` and
`TIMESTAMP_NTZ(q)` are not fully supported for precision parameters in `[6, 9]`.
This creates a gap in timestamp cast parity for explicit `CAST(...)` operations.
### Proposal
Implement explicit cast support for:
- `CAST(<timestamp_ltz(p)> AS TIMESTAMP_NTZ(q))`
- `CAST(<timestamp_ntz(p)> AS TIMESTAMP_LTZ(q))`
where `p, q in [6, 9]`.
Semantics:
- Follow existing LTZ<->NTZ timezone conversion behavior.
- Follow existing precision narrowing behavior (`q < p`): truncate/floor
fractional precision consistently with current cast semantics, including
pre-epoch values.
- Keep `p=6` behavior aligned with existing mapping to microsecond timestamp
types.
### Scope
In scope:
- Explicit `CAST` only.
- Analyzer cast admissibility updates.
- Runtime implementation (interpreted + codegen).
- Catalyst and SQL golden test coverage for representative `p,q` combinations
in `[6,9]`.
Out of scope:
- Implicit type coercion/common-type resolution changes (e.g. `CASE`, `UNION`
wider-type inference).
### Testing
- Add/extend catalyst cast tests for LTZ(p) <-> NTZ(q), including:
- boundary precisions (`6`, `9`)
- narrowing precision cases
- pre-epoch and timezone-sensitive cases
- interpreted vs codegen parity
- Extend SQL cast golden tests and regenerate expected outputs as needed.
### User impact
User-facing improvement:
- Explicit casts between `TIMESTAMP_LTZ(p)` and `TIMESTAMP_NTZ(q)` become
supported consistently for `p,q in [6,9]`.
### Risk
Low to medium:
- Touches cast type-check and conversion paths for nanos timestamp types.
- Mitigated by focused unit + SQL golden coverage.
> Add explicit CAST support for TIMESTAMP_LTZ(p) <-> TIMESTAMP_NTZ(q) (p,q in
> [6,9])
> ----------------------------------------------------------------------------------
>
> Key: SPARK-57511
> URL: https://issues.apache.org/jira/browse/SPARK-57511
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
>
> h3. Problem
> Spark currently supports:
> * {{TIMESTAMP_NTZ(p)}} -> {{TIMESTAMP_NTZ(q)}} and {{TIMESTAMP_LTZ(p)}} ->
> {{TIMESTAMP_LTZ(q)}} casts
> * microsecond family {{TIMESTAMP}} <-> {{TIMESTAMP_NTZ}} casts
> But direct cross-family nanos casts between {{TIMESTAMP_LTZ(p)}} and
> {{TIMESTAMP_NTZ(q)}} are not fully supported for precision parameters in
> {{[6, 9]}}. This creates a gap in timestamp cast parity for explicit
> {{CAST(...)}} operations.
> h3. Proposal
> Implement explicit cast support for:
> * {{CAST(<timestamp_ltz(p)> AS TIMESTAMP_NTZ(q))}}
> * {{CAST(<timestamp_ntz(p)> AS TIMESTAMP_LTZ(q))}}
> where {{p, q in [6, 9]}}.
> Semantics:
> * Follow existing LTZ<->NTZ timezone conversion behavior.
> * Follow existing precision narrowing behavior ({{q < p}}): truncate/floor
> fractional precision consistently with current cast semantics, including
> pre-epoch values.
> * Keep {{p=6}} behavior aligned with the existing mapping to microsecond
> timestamp types.
> h3. Scope
> In scope:
> * Explicit {{CAST}} only.
> * Analyzer cast admissibility updates.
> * Runtime implementation (interpreted + codegen).
> * Catalyst and SQL golden test coverage for representative {{p,q}}
> combinations in {{[6,9]}}.
> Out of scope:
> * Implicit type coercion / common-type resolution changes (e.g. {{CASE}},
> {{UNION}} wider-type inference).
> h3. Testing
> * Add/extend catalyst cast tests for LTZ(p) <-> NTZ(q), including:
> ** boundary precisions ({{6}}, {{9}})
> ** narrowing precision cases
> ** pre-epoch and timezone-sensitive cases
> ** interpreted vs codegen parity
> * Extend SQL cast golden tests and regenerate expected outputs as needed.
> h3. User impact
> User-facing improvement: explicit casts between {{TIMESTAMP_LTZ(p)}} and
> {{TIMESTAMP_NTZ(q)}} become supported consistently for {{p,q in [6,9]}}.
> h3. Risk
> Low to medium:
> * Touches cast type-check and conversion paths for nanos timestamp types.
> * Mitigated by focused unit + SQL golden coverage.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]