[
https://issues.apache.org/jira/browse/SPARK-57469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57469:
-----------------------------------
Labels: pull-request-available (was: )
> Support date field functions on nanosecond-precision timestamps in ANSI mode
> ----------------------------------------------------------------------------
>
> Key: SPARK-57469
> URL: https://issues.apache.org/jira/browse/SPARK-57469
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. Background
> SPARK-57323 added explicit {{CAST}} support between {{DATE}} and the
> nanosecond-precision
> timestamp types {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} (p = 7..9). The
> date field
> extraction functions ({{year}}, {{month}}, {{day}}/{{dayofmonth}},
> {{quarter}},
> {{dayofyear}}, {{dayofweek}}, {{weekday}}, {{weekofyear}}, {{yearofweek}})
> all extend
> {{GetDateField}}, which declares {{inputTypes = Seq(DateType)}} and relies on
> implicit type
> coercion to cast a timestamp argument to {{DATE}}.
> h2. Problem
> These functions do not work on nanosecond-precision timestamps in ANSI mode
> (the default
> since Spark 4.0). For example:
> {code:sql}
> SET spark.sql.timestampNanosTypes.enabled=true;
> SELECT year(TIMESTAMP_NTZ '2020-01-01 12:30:15.123456789'::timestamp_ntz(9));
> {code}
> fails analysis with a {{DATATYPE_MISMATCH}} error.
> The reason is twofold:
> * The generic ANSI implicit-cast rule defers to {{Cast.canANSIStoreAssign}},
> which (by design,
> per SPARK-57323) returns {{false}} for {{nanos -> DATE}}, so no implicit
> cast is inserted.
> * The dedicated ANSI hack rule {{AnsiGetDateFieldOperationsTypeCoercion}} —
> which is exactly
> what makes the equivalent micro {{TIMESTAMP -> DATE}} field extraction work
> — matches only
> {{AnyTimestampTypeExpression}}, i.e. the microsecond {{TimestampType}} /
> {{TimestampNTZType}},
> not the nanosecond types.
> In non-ANSI mode the functions already work, because the default type
> coercion has a blanket
> {{(_: DatetimeType, _: DatetimeType)}} implicit-cast arm and both nanos types
> extend
> {{DatetimeType}}.
> h2. Proposed change
> Mirror the existing micro abstractions for the nanosecond types and reuse
> them, rather than
> widening {{AnyTimestampType}} (which is also used as an {{inputTypes}}
> "accept-as-is, no cast"
> gate on many micro-only expressions, so widening it would route raw nanos
> values into
> micro-only eval/codegen):
> * Add {{AnyTimestampNanoType}} ({{AbstractDataType}}) and
> {{AnyTimestampNanoTypeExpression}}
> (expression extractor) matching {{TimestampLTZNanosType}} /
> {{TimestampNTZNanosType}}.
> * Extend {{AnsiGetDateFieldOperationsTypeCoercion}} to also match
> nanos-timestamp children and
> cast them to {{DATE}}, exactly as it already does for micro timestamps.
> This keeps {{Cast.canANSIStoreAssign}} / {{Cast.canUpCast}} strict for {{DATE
> <-> nanos}} (the
> cast is inserted explicitly by the field-extraction rule, identical to the
> micro path).
> h2. EXTRACT / date_part
> No {{EXTRACT}}-specific change is needed for the date components.
> {{EXTRACT(field FROM source)}}
> is a {{RuntimeReplaceable}} that rewrites via {{DatePart.parseExtractField}}
> to the same
> {{GetDateField}} expressions ({{YEAR -> Year(source)}}, {{MONTH ->
> Month(source)}},
> {{DAY -> DayOfMonth(source)}}, {{QUARTER}}, {{WEEK}}, {{DOY}}, {{DOW}},
> {{DOW_ISO}},
> {{YEAROFWEEK}}), so once the {{GetDateField}} coercion is fixed,
> {{extract(year from nanos_ts)}}
> and {{date_part('year', nanos_ts)}} work transitively in both ANSI and
> non-ANSI modes.
> The time-of-day fields are already handled separately by SPARK-57340:
> {{HOUR}} / {{MINUTE}} cast
> the nanos source down to the matching microsecond timestamp (lossless for
> those integer fields),
> and {{SECOND}} keeps the sub-microsecond digits via
> {{SecondWithFractionNanos}}.
> Tests will cover both the function form ({{year(ts)}} ...) and the
> {{EXTRACT}} / {{date_part}}
> forms, over {{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_LTZ(p)}}, in ANSI and
> non-ANSI modes.
> h2. Scope
> In scope: date field extraction functions (and the transitive {{EXTRACT}} /
> {{date_part}} date
> components) over {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} in both ANSI and
> non-ANSI modes,
> with tests.
> Out of scope (separate follow-up): other call sites that match only
> {{AnyTimestampTypeExpression}}
> — {{date_add}} / {{date_sub}}, timestamp subtraction
> ({{SubtractTimestamps}}), and the binary
> arithmetic datetime resolver — since each needs its own
> precision-preservation decision.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]