[ 
https://issues.apache.org/jira/browse/SPARK-57469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57469:
-----------------------------
    Description: 
h2. Background

SPARK-57323 added explicit {{CAST}} support between {{DATE}} and the 
nanosecond-precision
timestamp types {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} (p = 7..9). The 
date field
extraction functions ({{year}}, {{month}}, {{day}}/{{dayofmonth}}, {{quarter}},
{{dayofyear}}, {{dayofweek}}, {{weekday}}, {{weekofyear}}, {{yearofweek}}) all 
extend
{{GetDateField}}, which declares {{inputTypes = Seq(DateType)}} and relies on 
implicit type
coercion to cast a timestamp argument to {{DATE}}.

h2. Problem

These functions do not work on nanosecond-precision timestamps in ANSI mode 
(the default
since Spark 4.0). For example:

{code:sql}
SET spark.sql.timestampNanosTypes.enabled=true;
SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9));
{code}

fails analysis with a {{DATATYPE_MISMATCH}} error.

The reason is twofold:
* The generic ANSI implicit-cast rule defers to {{Cast.canANSIStoreAssign}}, 
which (by design,
  per SPARK-57323) returns {{false}} for {{nanos -> DATE}}, so no implicit cast 
is inserted.
* The dedicated ANSI hack rule {{AnsiGetDateFieldOperationsTypeCoercion}} — 
which is exactly
  what makes the equivalent micro {{TIMESTAMP -> DATE}} field extraction work — 
matches only
  {{AnyTimestampTypeExpression}}, i.e. the microsecond {{TimestampType}} / 
{{TimestampNTZType}},
  not the nanosecond types.

In non-ANSI mode the functions already work, because the default type coercion 
has a blanket
{{(_: DatetimeType, _: DatetimeType)}} implicit-cast arm and both nanos types 
extend
{{DatetimeType}}.

h2. Proposed change

Mirror the existing micro abstractions for the nanosecond types and reuse them, 
rather than
widening {{AnyTimestampType}} (which is also used as an {{inputTypes}} 
"accept-as-is, no cast"
gate on many micro-only expressions, so widening it would route raw nanos 
values into
micro-only eval/codegen):

* Add {{AnyTimestampNanoType}} ({{AbstractDataType}}) and 
{{AnyTimestampNanoTypeExpression}}
  (expression extractor) matching {{TimestampLTZNanosType}} / 
{{TimestampNTZNanosType}}.
* Extend {{AnsiGetDateFieldOperationsTypeCoercion}} to also match 
nanos-timestamp children and
  cast them to {{DATE}}, exactly as it already does for micro timestamps.

This keeps {{Cast.canANSIStoreAssign}} / {{Cast.canUpCast}} strict for {{DATE 
<-> nanos}} (the
cast is inserted explicitly by the field-extraction rule, identical to the 
micro path).

h2. EXTRACT / date_part

No {{EXTRACT}}-specific change is needed for the date components. 
{{EXTRACT(field FROM source)}}
is a {{RuntimeReplaceable}} that rewrites via {{DatePart.parseExtractField}} to 
the same
{{GetDateField}} expressions ({{YEAR -> Year(source)}}, {{MONTH -> 
Month(source)}},
{{DAY -> DayOfMonth(source)}}, {{QUARTER}}, {{WEEK}}, {{DOY}}, {{DOW}}, 
{{DOW_ISO}},
{{YEAROFWEEK}}), so once the {{GetDateField}} coercion is fixed, {{extract(year 
from nanos_ts)}}
and {{date_part('year', nanos_ts)}} work transitively in both ANSI and non-ANSI 
modes.

The time-of-day fields are already handled separately by SPARK-57340: {{HOUR}} 
/ {{MINUTE}} cast
the nanos source down to the matching microsecond timestamp (lossless for those 
integer fields),
and {{SECOND}} keeps the sub-microsecond digits via {{SecondWithFractionNanos}}.

Tests will cover both the function form ({{year(ts)}} ...) and the {{EXTRACT}} 
/ {{date_part}}
forms, over {{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_LTZ(p)}}, in ANSI and non-ANSI 
modes.

h2. Scope

In scope: date field extraction functions (and the transitive {{EXTRACT}} / 
{{date_part}} date
components) over {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} in both ANSI and 
non-ANSI modes,
with tests.

Out of scope (separate follow-up): other call sites that match only 
{{AnyTimestampTypeExpression}}
— {{date_add}} / {{date_sub}}, timestamp subtraction ({{SubtractTimestamps}}), 
and the binary
arithmetic datetime resolver — since each needs its own precision-preservation 
decision.

  was:
h2. Background

SPARK-57323 added explicit {{CAST}} support between {{DATE}} and the 
nanosecond-precision
timestamp types {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} (p = 7..9). The 
date field
extraction functions ({{year}}, {{month}}, {{day}}/{{dayofmonth}}, {{quarter}},
{{dayofyear}}, {{dayofweek}}, {{weekday}}, {{weekofyear}}, {{yearofweek}}) all 
extend
{{GetDateField}}, which declares {{inputTypes = Seq(DateType)}} and relies on 
implicit type
coercion to cast a timestamp argument to {{DATE}}.

h2. Problem

These functions do not work on nanosecond-precision timestamps in ANSI mode 
(the default
since Spark 4.0). For example:

{code:sql}
SET spark.sql.timestampNanosTypes.enabled=true;
SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9));
{code}

fails analysis with a {{DATATYPE_MISMATCH}} error.

The reason is twofold:
* The generic ANSI implicit-cast rule defers to {{Cast.canANSIStoreAssign}}, 
which (by design,
  per SPARK-57323) returns {{false}} for {{nanos -> DATE}}, so no implicit cast 
is inserted.
* The dedicated ANSI hack rule {{AnsiGetDateFieldOperationsTypeCoercion}} — 
which is exactly
  what makes the equivalent micro {{TIMESTAMP -> DATE}} field extraction work — 
matches only
  {{AnyTimestampTypeExpression}}, i.e. the microsecond {{TimestampType}} / 
{{TimestampNTZType}},
  not the nanosecond types.

In non-ANSI mode the functions already work, because the default type coercion 
has a blanket
{{(_: DatetimeType, _: DatetimeType)}} implicit-cast arm and both nanos types 
extend
{{DatetimeType}}.

h2. Proposed change

Mirror the existing micro abstractions for the nanosecond types and reuse them, 
rather than
widening {{AnyTimestampType}} (which is also used as an {{inputTypes}} 
"accept-as-is, no cast"
gate on many micro-only expressions, so widening it would route raw nanos 
values into
micro-only eval/codegen):

* Add {{AnyTimestampNanoType}} ({{AbstractDataType}}) and 
{{AnyTimestampNanoTypeExpression}}
  (expression extractor) matching {{TimestampLTZNanosType}} / 
{{TimestampNTZNanosType}}.
* Extend {{AnsiGetDateFieldOperationsTypeCoercion}} to also match 
nanos-timestamp children and
  cast them to {{DATE}}, exactly as it already does for micro timestamps.

This keeps {{Cast.canANSIStoreAssign}} / {{Cast.canUpCast}} strict for {{DATE 
<-> nanos}} (the
cast is inserted explicitly by the field-extraction rule, identical to the 
micro path).

h2. Scope

In scope: date field extraction functions over {{TIMESTAMP_NTZ(p)}} / 
{{TIMESTAMP_LTZ(p)}} in
both ANSI and non-ANSI modes, with tests.

Out of scope (separate follow-up): other call sites that match only 
{{AnyTimestampTypeExpression}}
— {{date_add}} / {{date_sub}}, timestamp subtraction ({{SubtractTimestamps}}), 
and the binary
arithmetic datetime resolver — since each needs its own precision-preservation 
decision.


> Support date field functions on nanosecond-precision timestamps in ANSI mode
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-57469
>                 URL: https://issues.apache.org/jira/browse/SPARK-57469
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> h2. Background
> SPARK-57323 added explicit {{CAST}} support between {{DATE}} and the 
> nanosecond-precision
> timestamp types {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} (p = 7..9). The 
> date field
> extraction functions ({{year}}, {{month}}, {{day}}/{{dayofmonth}}, 
> {{quarter}},
> {{dayofyear}}, {{dayofweek}}, {{weekday}}, {{weekofyear}}, {{yearofweek}}) 
> all extend
> {{GetDateField}}, which declares {{inputTypes = Seq(DateType)}} and relies on 
> implicit type
> coercion to cast a timestamp argument to {{DATE}}.
> h2. Problem
> These functions do not work on nanosecond-precision timestamps in ANSI mode 
> (the default
> since Spark 4.0). For example:
> {code:sql}
> SET spark.sql.timestampNanosTypes.enabled=true;
> SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9));
> {code}
> fails analysis with a {{DATATYPE_MISMATCH}} error.
> The reason is twofold:
> * The generic ANSI implicit-cast rule defers to {{Cast.canANSIStoreAssign}}, 
> which (by design,
>   per SPARK-57323) returns {{false}} for {{nanos -> DATE}}, so no implicit 
> cast is inserted.
> * The dedicated ANSI hack rule {{AnsiGetDateFieldOperationsTypeCoercion}} — 
> which is exactly
>   what makes the equivalent micro {{TIMESTAMP -> DATE}} field extraction work 
> — matches only
>   {{AnyTimestampTypeExpression}}, i.e. the microsecond {{TimestampType}} / 
> {{TimestampNTZType}},
>   not the nanosecond types.
> In non-ANSI mode the functions already work, because the default type 
> coercion has a blanket
> {{(_: DatetimeType, _: DatetimeType)}} implicit-cast arm and both nanos types 
> extend
> {{DatetimeType}}.
> h2. Proposed change
> Mirror the existing micro abstractions for the nanosecond types and reuse 
> them, rather than
> widening {{AnyTimestampType}} (which is also used as an {{inputTypes}} 
> "accept-as-is, no cast"
> gate on many micro-only expressions, so widening it would route raw nanos 
> values into
> micro-only eval/codegen):
> * Add {{AnyTimestampNanoType}} ({{AbstractDataType}}) and 
> {{AnyTimestampNanoTypeExpression}}
>   (expression extractor) matching {{TimestampLTZNanosType}} / 
> {{TimestampNTZNanosType}}.
> * Extend {{AnsiGetDateFieldOperationsTypeCoercion}} to also match 
> nanos-timestamp children and
>   cast them to {{DATE}}, exactly as it already does for micro timestamps.
> This keeps {{Cast.canANSIStoreAssign}} / {{Cast.canUpCast}} strict for {{DATE 
> <-> nanos}} (the
> cast is inserted explicitly by the field-extraction rule, identical to the 
> micro path).
> h2. EXTRACT / date_part
> No {{EXTRACT}}-specific change is needed for the date components. 
> {{EXTRACT(field FROM source)}}
> is a {{RuntimeReplaceable}} that rewrites via {{DatePart.parseExtractField}} 
> to the same
> {{GetDateField}} expressions ({{YEAR -> Year(source)}}, {{MONTH -> 
> Month(source)}},
> {{DAY -> DayOfMonth(source)}}, {{QUARTER}}, {{WEEK}}, {{DOY}}, {{DOW}}, 
> {{DOW_ISO}},
> {{YEAROFWEEK}}), so once the {{GetDateField}} coercion is fixed, 
> {{extract(year from nanos_ts)}}
> and {{date_part('year', nanos_ts)}} work transitively in both ANSI and 
> non-ANSI modes.
> The time-of-day fields are already handled separately by SPARK-57340: 
> {{HOUR}} / {{MINUTE}} cast
> the nanos source down to the matching microsecond timestamp (lossless for 
> those integer fields),
> and {{SECOND}} keeps the sub-microsecond digits via 
> {{SecondWithFractionNanos}}.
> Tests will cover both the function form ({{year(ts)}} ...) and the 
> {{EXTRACT}} / {{date_part}}
> forms, over {{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_LTZ(p)}}, in ANSI and 
> non-ANSI modes.
> h2. Scope
> In scope: date field extraction functions (and the transitive {{EXTRACT}} / 
> {{date_part}} date
> components) over {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} in both ANSI and 
> non-ANSI modes,
> with tests.
> Out of scope (separate follow-up): other call sites that match only 
> {{AnyTimestampTypeExpression}}
> — {{date_add}} / {{date_sub}}, timestamp subtraction 
> ({{SubtractTimestamps}}), and the binary
> arithmetic datetime resolver — since each needs its own 
> precision-preservation decision.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to