Max Gekk created SPARK-57469:
--------------------------------
Summary: Support date field functions on nanosecond-precision
timestamps in ANSI mode
Key: SPARK-57469
URL: https://issues.apache.org/jira/browse/SPARK-57469
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. Background
SPARK-57323 added explicit {{CAST}} support between {{DATE}} and the
nanosecond-precision
timestamp types {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} (p = 7..9). The
date field
extraction functions ({{year}}, {{month}}, {{day}}/{{dayofmonth}}, {{quarter}},
{{dayofyear}}, {{dayofweek}}, {{weekday}}, {{weekofyear}}, {{yearofweek}}) all
extend
{{GetDateField}}, which declares {{inputTypes = Seq(DateType)}} and relies on
implicit type
coercion to cast a timestamp argument to {{DATE}}.
h2. Problem
These functions do not work on nanosecond-precision timestamps in ANSI mode
(the default
since Spark 4.0). For example:
{code:sql}
SET spark.sql.timestampNanosTypes.enabled=true;
SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9));
{code}
fails analysis with a {{DATATYPE_MISMATCH}} error.
The reason is twofold:
* The generic ANSI implicit-cast rule defers to {{Cast.canANSIStoreAssign}},
which (by design,
per SPARK-57323) returns {{false}} for {{nanos -> DATE}}, so no implicit cast
is inserted.
* The dedicated ANSI hack rule {{AnsiGetDateFieldOperationsTypeCoercion}} —
which is exactly
what makes the equivalent micro {{TIMESTAMP -> DATE}} field extraction work —
matches only
{{AnyTimestampTypeExpression}}, i.e. the microsecond {{TimestampType}} /
{{TimestampNTZType}},
not the nanosecond types.
In non-ANSI mode the functions already work, because the default type coercion
has a blanket
{{(_: DatetimeType, _: DatetimeType)}} implicit-cast arm and both nanos types
extend
{{DatetimeType}}.
h2. Proposed change
Mirror the existing micro abstractions for the nanosecond types and reuse them,
rather than
widening {{AnyTimestampType}} (which is also used as an {{inputTypes}}
"accept-as-is, no cast"
gate on many micro-only expressions, so widening it would route raw nanos
values into
micro-only eval/codegen):
* Add {{AnyTimestampNanoType}} ({{AbstractDataType}}) and
{{AnyTimestampNanoTypeExpression}}
(expression extractor) matching {{TimestampLTZNanosType}} /
{{TimestampNTZNanosType}}.
* Extend {{AnsiGetDateFieldOperationsTypeCoercion}} to also match
nanos-timestamp children and
cast them to {{DATE}}, exactly as it already does for micro timestamps.
This keeps {{Cast.canANSIStoreAssign}} / {{Cast.canUpCast}} strict for {{DATE
<-> nanos}} (the
cast is inserted explicitly by the field-extraction rule, identical to the
micro path).
h2. Scope
In scope: date field extraction functions over {{TIMESTAMP_NTZ(p)}} /
{{TIMESTAMP_LTZ(p)}} in
both ANSI and non-ANSI modes, with tests.
Out of scope (separate follow-up): other call sites that match only
{{AnyTimestampTypeExpression}}
— {{date_add}} / {{date_sub}}, timestamp subtraction ({{SubtractTimestamps}}),
and the binary
arithmetic datetime resolver — since each needs its own precision-preservation
decision.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]