[ 
https://issues.apache.org/jira/browse/SPARK-57511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57511:
-----------------------------
    Description: 
h3. Problem

Spark currently supports:
* {{TIMESTAMP_NTZ(p)}} -> {{TIMESTAMP_NTZ(q)}} and {{TIMESTAMP_LTZ(p)}} -> 
{{TIMESTAMP_LTZ(q)}} casts
* microsecond family {{TIMESTAMP}} <-> {{TIMESTAMP_NTZ}} casts

But direct cross-family nanos casts between {{TIMESTAMP_LTZ(p)}} and 
{{TIMESTAMP_NTZ(q)}} are not fully supported for precision parameters in {{[6, 
9]}}. This creates a gap in timestamp cast parity for explicit {{CAST(...)}} 
operations.

h3. Proposal

Implement explicit cast support for:
* {{CAST(<timestamp_ltz(p)> AS TIMESTAMP_NTZ(q))}}
* {{CAST(<timestamp_ntz(p)> AS TIMESTAMP_LTZ(q))}}

where {{p, q in [6, 9]}}.

Semantics:
* Follow existing LTZ<->NTZ timezone conversion behavior.
* Follow existing precision narrowing behavior ({{q < p}}): truncate/floor 
fractional precision consistently with current cast semantics, including 
pre-epoch values.
* Keep {{p=6}} behavior aligned with the existing mapping to microsecond 
timestamp types.

h3. Scope

In scope:
* Explicit {{CAST}} only.
* Analyzer cast admissibility updates.
* Runtime implementation (interpreted + codegen).
* Catalyst and SQL golden test coverage for representative {{p,q}} combinations 
in {{[6,9]}}.

Out of scope:
* Implicit type coercion / common-type resolution changes (e.g. {{CASE}}, 
{{UNION}} wider-type inference).

h3. Testing

* Add/extend catalyst cast tests for LTZ(p) <-> NTZ(q), including:
** boundary precisions ({{6}}, {{9}})
** narrowing precision cases
** pre-epoch and timezone-sensitive cases
** interpreted vs codegen parity
* Extend SQL cast golden tests and regenerate expected outputs as needed.

h3. User impact

User-facing improvement: explicit casts between {{TIMESTAMP_LTZ(p)}} and 
{{TIMESTAMP_NTZ(q)}} become supported consistently for {{p,q in [6,9]}}.

h3. Risk

Low to medium:
* Touches cast type-check and conversion paths for nanos timestamp types.
* Mitigated by focused unit + SQL golden coverage.

  was:
### Problem
Spark currently supports:
- `TIMESTAMP_NTZ(p)` -> `TIMESTAMP_NTZ(q)` and `TIMESTAMP_LTZ(p)` -> 
`TIMESTAMP_LTZ(q)` casts
- microsecond family `TIMESTAMP` <-> `TIMESTAMP_NTZ` casts

But direct cross-family nanos casts between `TIMESTAMP_LTZ(p)` and 
`TIMESTAMP_NTZ(q)` are not fully supported for precision parameters in `[6, 9]`.

This creates a gap in timestamp cast parity for explicit `CAST(...)` operations.

### Proposal
Implement explicit cast support for:
- `CAST(<timestamp_ltz(p)> AS TIMESTAMP_NTZ(q))`
- `CAST(<timestamp_ntz(p)> AS TIMESTAMP_LTZ(q))`
where `p, q in [6, 9]`.

Semantics:
- Follow existing LTZ<->NTZ timezone conversion behavior.
- Follow existing precision narrowing behavior (`q < p`): truncate/floor 
fractional precision consistently with current cast semantics, including 
pre-epoch values.
- Keep `p=6` behavior aligned with existing mapping to microsecond timestamp 
types.

### Scope
In scope:
- Explicit `CAST` only.
- Analyzer cast admissibility updates.
- Runtime implementation (interpreted + codegen).
- Catalyst and SQL golden test coverage for representative `p,q` combinations 
in `[6,9]`.

Out of scope:
- Implicit type coercion/common-type resolution changes (e.g. `CASE`, `UNION` 
wider-type inference).

### Testing
- Add/extend catalyst cast tests for LTZ(p) <-> NTZ(q), including:
  - boundary precisions (`6`, `9`)
  - narrowing precision cases
  - pre-epoch and timezone-sensitive cases
  - interpreted vs codegen parity
- Extend SQL cast golden tests and regenerate expected outputs as needed.

### User impact
User-facing improvement:
- Explicit casts between `TIMESTAMP_LTZ(p)` and `TIMESTAMP_NTZ(q)` become 
supported consistently for `p,q in [6,9]`.

### Risk
Low to medium:
- Touches cast type-check and conversion paths for nanos timestamp types.
- Mitigated by focused unit + SQL golden coverage.


> Add explicit CAST support for TIMESTAMP_LTZ(p) <-> TIMESTAMP_NTZ(q) (p,q in 
> [6,9])
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-57511
>                 URL: https://issues.apache.org/jira/browse/SPARK-57511
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> h3. Problem
> Spark currently supports:
> * {{TIMESTAMP_NTZ(p)}} -> {{TIMESTAMP_NTZ(q)}} and {{TIMESTAMP_LTZ(p)}} -> 
> {{TIMESTAMP_LTZ(q)}} casts
> * microsecond family {{TIMESTAMP}} <-> {{TIMESTAMP_NTZ}} casts
> But direct cross-family nanos casts between {{TIMESTAMP_LTZ(p)}} and 
> {{TIMESTAMP_NTZ(q)}} are not fully supported for precision parameters in 
> {{[6, 9]}}. This creates a gap in timestamp cast parity for explicit 
> {{CAST(...)}} operations.
> h3. Proposal
> Implement explicit cast support for:
> * {{CAST(<timestamp_ltz(p)> AS TIMESTAMP_NTZ(q))}}
> * {{CAST(<timestamp_ntz(p)> AS TIMESTAMP_LTZ(q))}}
> where {{p, q in [6, 9]}}.
> Semantics:
> * Follow existing LTZ<->NTZ timezone conversion behavior.
> * Follow existing precision narrowing behavior ({{q < p}}): truncate/floor 
> fractional precision consistently with current cast semantics, including 
> pre-epoch values.
> * Keep {{p=6}} behavior aligned with the existing mapping to microsecond 
> timestamp types.
> h3. Scope
> In scope:
> * Explicit {{CAST}} only.
> * Analyzer cast admissibility updates.
> * Runtime implementation (interpreted + codegen).
> * Catalyst and SQL golden test coverage for representative {{p,q}} 
> combinations in {{[6,9]}}.
> Out of scope:
> * Implicit type coercion / common-type resolution changes (e.g. {{CASE}}, 
> {{UNION}} wider-type inference).
> h3. Testing
> * Add/extend catalyst cast tests for LTZ(p) <-> NTZ(q), including:
> ** boundary precisions ({{6}}, {{9}})
> ** narrowing precision cases
> ** pre-epoch and timezone-sensitive cases
> ** interpreted vs codegen parity
> * Extend SQL cast golden tests and regenerate expected outputs as needed.
> h3. User impact
> User-facing improvement: explicit casts between {{TIMESTAMP_LTZ(p)}} and 
> {{TIMESTAMP_NTZ(q)}} become supported consistently for {{p,q in [6,9]}}.
> h3. Risk
> Low to medium:
> * Touches cast type-check and conversion paths for nanos timestamp types.
> * Mitigated by focused unit + SQL golden coverage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to