Max Gekk created SPARK-57315:
--------------------------------
Summary: Support HOUR, MINUTE and SECOND functions over
nanosecond-precision timestamps
Key: SPARK-57315
URL: https://issues.apache.org/jira/browse/SPARK-57315
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
Assignee: Max Gekk
The nanosecond-precision timestamp types TIMESTAMP_NTZ(p) and TIMESTAMP_LTZ(p)
(p in [7, 9]) are currently being added to Spark SQL. Their physical value is
TimestampNanosVal(epochMicros: Long, nanosWithinMicro: Short).
The time-of-day extraction functions hour(), minute() and second() do not yet
accept these types. They are implemented by the GetTimeField expressions
(Hour, Minute, Second), whose inputTypes is AnyTimestampType, which only accepts
the microsecond TimestampType and TimestampNTZType. As a result, calling these
functions on a TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) value fails analysis.
These three functions return an integer field (hour 0-23, minute 0-59, second
0-59) that depends only on epochMicros; the sub-microsecond digits never affect
the result. We can therefore reuse the existing expressions and DateTimeUtils
logic by casting the nanosecond input down to the matching microsecond type
before evaluation:
- TimestampNTZNanosType(p) -> TimestampNTZType (UTC / wall-clock extraction)
- TimestampLTZNanosType(p) -> TimestampType (session-zone extraction)
The cast (already available, SPARK-57293) keeps epochMicros and drops
nanosWithinMicro, which is lossless for these integer results.
Implementation:
- Add a dedicated analyzer rule (ResolveTimestampNanosExpressions), modeled on
ResolveBinaryArithmetic, that rewrites a resolved Hour/Minute/Second whose
child is a nanosecond timestamp type into <expr>(Cast(child, microType)).
The rule is preferred over a TypeCoercion rule so the behavioral change
stays
scoped to these functions rather than every AnyTimestampType expression.
- The rule is named generically so future nanos-aware expressions can be added
as additional case branches.
Out of scope:
- SecondWithFraction (the extract(SECOND) path returning DECIMAL(8,6)) is
excluded because its result depends on the sub-microsecond digits.
- Other timestamp expressions that return a timestamp, read sub-second
precision, or compare/order/hash the value require genuine nanos-aware
evaluation and are handled separately.
This change is gated by spark.sql.timestampNanosTypes.enabled.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]