Max Gekk created SPARK-57315:
--------------------------------

             Summary: Support HOUR, MINUTE and SECOND functions over 
nanosecond-precision timestamps
                 Key: SPARK-57315
                 URL: https://issues.apache.org/jira/browse/SPARK-57315
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk
            Assignee: Max Gekk


The nanosecond-precision timestamp types TIMESTAMP_NTZ(p) and TIMESTAMP_LTZ(p)
(p in [7, 9]) are currently being added to Spark SQL. Their physical value is
TimestampNanosVal(epochMicros: Long, nanosWithinMicro: Short).

The time-of-day extraction functions hour(), minute() and second() do not yet
accept these types. They are implemented by the GetTimeField expressions
(Hour, Minute, Second), whose inputTypes is AnyTimestampType, which only accepts
the microsecond TimestampType and TimestampNTZType. As a result, calling these
functions on a TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) value fails analysis.

These three functions return an integer field (hour 0-23, minute 0-59, second
0-59) that depends only on epochMicros; the sub-microsecond digits never affect
the result. We can therefore reuse the existing expressions and DateTimeUtils
logic by casting the nanosecond input down to the matching microsecond type
before evaluation:
  - TimestampNTZNanosType(p) -> TimestampNTZType (UTC / wall-clock extraction)
  - TimestampLTZNanosType(p) -> TimestampType   (session-zone extraction)

The cast (already available, SPARK-57293) keeps epochMicros and drops
nanosWithinMicro, which is lossless for these integer results.

Implementation:
  - Add a dedicated analyzer rule (ResolveTimestampNanosExpressions), modeled on
    ResolveBinaryArithmetic, that rewrites a resolved Hour/Minute/Second whose
    child is a nanosecond timestamp type into <expr>(Cast(child, microType)).
    The rule is preferred over a TypeCoercion rule so the behavioral change 
stays
    scoped to these functions rather than every AnyTimestampType expression.
  - The rule is named generically so future nanos-aware expressions can be added
    as additional case branches.

Out of scope:
  - SecondWithFraction (the extract(SECOND) path returning DECIMAL(8,6)) is
    excluded because its result depends on the sub-microsecond digits.
  - Other timestamp expressions that return a timestamp, read sub-second
    precision, or compare/order/hash the value require genuine nanos-aware
    evaluation and are handled separately.

This change is gated by spark.sql.timestampNanosTypes.enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to