Max Gekk created SPARK-57469:
--------------------------------

             Summary: Support date field functions on nanosecond-precision 
timestamps in ANSI mode
                 Key: SPARK-57469
                 URL: https://issues.apache.org/jira/browse/SPARK-57469
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. Background

SPARK-57323 added explicit {{CAST}} support between {{DATE}} and the 
nanosecond-precision
timestamp types {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} (p = 7..9). The 
date field
extraction functions ({{year}}, {{month}}, {{day}}/{{dayofmonth}}, {{quarter}},
{{dayofyear}}, {{dayofweek}}, {{weekday}}, {{weekofyear}}, {{yearofweek}}) all 
extend
{{GetDateField}}, which declares {{inputTypes = Seq(DateType)}} and relies on 
implicit type
coercion to cast a timestamp argument to {{DATE}}.

h2. Problem

These functions do not work on nanosecond-precision timestamps in ANSI mode 
(the default
since Spark 4.0). For example:

{code:sql}
SET spark.sql.timestampNanosTypes.enabled=true;
SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9));
{code}

fails analysis with a {{DATATYPE_MISMATCH}} error.

The reason is twofold:
* The generic ANSI implicit-cast rule defers to {{Cast.canANSIStoreAssign}}, 
which (by design,
  per SPARK-57323) returns {{false}} for {{nanos -> DATE}}, so no implicit cast 
is inserted.
* The dedicated ANSI hack rule {{AnsiGetDateFieldOperationsTypeCoercion}} — 
which is exactly
  what makes the equivalent micro {{TIMESTAMP -> DATE}} field extraction work — 
matches only
  {{AnyTimestampTypeExpression}}, i.e. the microsecond {{TimestampType}} / 
{{TimestampNTZType}},
  not the nanosecond types.

In non-ANSI mode the functions already work, because the default type coercion 
has a blanket
{{(_: DatetimeType, _: DatetimeType)}} implicit-cast arm and both nanos types 
extend
{{DatetimeType}}.

h2. Proposed change

Mirror the existing micro abstractions for the nanosecond types and reuse them, 
rather than
widening {{AnyTimestampType}} (which is also used as an {{inputTypes}} 
"accept-as-is, no cast"
gate on many micro-only expressions, so widening it would route raw nanos 
values into
micro-only eval/codegen):

* Add {{AnyTimestampNanoType}} ({{AbstractDataType}}) and 
{{AnyTimestampNanoTypeExpression}}
  (expression extractor) matching {{TimestampLTZNanosType}} / 
{{TimestampNTZNanosType}}.
* Extend {{AnsiGetDateFieldOperationsTypeCoercion}} to also match 
nanos-timestamp children and
  cast them to {{DATE}}, exactly as it already does for micro timestamps.

This keeps {{Cast.canANSIStoreAssign}} / {{Cast.canUpCast}} strict for {{DATE 
<-> nanos}} (the
cast is inserted explicitly by the field-extraction rule, identical to the 
micro path).

h2. Scope

In scope: date field extraction functions over {{TIMESTAMP_NTZ(p)}} / 
{{TIMESTAMP_LTZ(p)}} in
both ANSI and non-ANSI modes, with tests.

Out of scope (separate follow-up): other call sites that match only 
{{AnyTimestampTypeExpression}}
— {{date_add}} / {{date_sub}}, timestamp subtraction ({{SubtractTimestamps}}), 
and the binary
arithmetic datetime resolver — since each needs its own precision-preservation 
decision.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to