[
https://issues.apache.org/jira/browse/SPARK-57339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57339:
-----------------------------------
Labels: pull-request-available (was: )
> Format nanosecond-precision timestamp literals in Literal.toString and
> Literal.sql
> ----------------------------------------------------------------------------------
>
> Key: SPARK-57339
> URL: https://issues.apache.org/jira/browse/SPARK-57339
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. Background
> As part of the nanosecond timestamp preview (SPARK-56822), the types
> {{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_LTZ(p)}} (with {{p}} in {{[7, 9]}}) are
> represented by literal values of type {{TimestampNanosVal}}.
> In {{Literal}}, both {{toString}} and {{sql}} have explicit, nicely-formatted
> cases for every other temporal literal type ({{DateType}}, {{TimeType}},
> {{TimestampType}}, {{TimestampNTZType}}), but the two nanosecond timestamp
> types
> have no dedicated case and fall through to a generic default:
> * {{Literal.toString}} -> {{case _ => other.toString}}, i.e. it prints the raw
> {{TimestampNanosVal.toString}}.
> * {{Literal.sql}} -> no {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} case at
> all.
> h2. Problem
> The raw {{TimestampNanosVal}} representation leaks into user-facing output
> such
> as analyzed plans, schemas and generated SQL. For example, the analyzer result
> of {{SELECT hour(TIMESTAMP_LTZ '2020-01-01 13:24:35.123456789')}} contains:
> {code:sql}
> Project [hour(cast(TimestampNanosVal(1577913875123456, 789) as timestamp),
> ...) AS hour(TimestampNanosVal(1577913875123456, 789))#x]
> {code}
> instead of a readable, round-trippable literal.
> This was raised during the review of [PR
> #56368|https://github.com/apache/spark/pull/56368].
> h2. Expected
> Add explicit cases for the nanosecond timestamp types so that the formatting
> is
> consistent with the microsecond timestamp types:
> * {{Literal.toString}} renders the value as a formatted timestamp string with
> up
> to 9 fractional digits.
> * {{Literal.sql}} emits typed literals, e.g.
> {{TIMESTAMP_NTZ '2018-02-14 12:58:59.123456789'}} /
> {{TIMESTAMP_LTZ '2020-01-01 13:24:35.123456789'}}.
> Also review the other {{value}}/{{dataType}} match sites in {{Literal}} (e.g.
> {{jsonFields}}, {{default}}, codegen) for the same missing nanos cases.
> h2. Scope
> * {{sql/catalyst/.../expressions/literals.scala}}: {{Literal.toString}} and
> {{Literal.sql}} (and any related match sites).
> * A formatter producing nanosecond precision for the new types.
> h2. Tests
> * Unit tests for {{Literal.toString}} / {{Literal.sql}} over
> {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} for {{p}} in {{[7, 9]}}.
> * Update affected golden files (e.g. {{timestamp-ltz-nanos.sql.out}}) once the
> formatting changes.
> h2. Notes
> This is a follow-up/cleanup item under the nanosecond timestamp preview
> umbrella (SPARK-56822) and is independent of the HOUR/MINUTE/SECOND support
> added in SPARK-57315.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]