Max Gekk created SPARK-57339:
--------------------------------
Summary: Format nanosecond-precision timestamp literals in
Literal.toString and Literal.sql
Key: SPARK-57339
URL: https://issues.apache.org/jira/browse/SPARK-57339
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. Background
As part of the nanosecond timestamp preview (SPARK-56822), the types
{{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_LTZ(p)}} (with {{p}} in {{[7, 9]}}) are
represented by literal values of type {{TimestampNanosVal}}.
In {{Literal}}, both {{toString}} and {{sql}} have explicit, nicely-formatted
cases for every other temporal literal type ({{DateType}}, {{TimeType}},
{{TimestampType}}, {{TimestampNTZType}}), but the two nanosecond timestamp types
have no dedicated case and fall through to a generic default:
* {{Literal.toString}} -> {{case _ => other.toString}}, i.e. it prints the raw
{{TimestampNanosVal.toString}}.
* {{Literal.sql}} -> no {{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} case at all.
h2. Problem
The raw {{TimestampNanosVal}} representation leaks into user-facing output such
as analyzed plans, schemas and generated SQL. For example, the analyzer result
of {{SELECT hour(TIMESTAMP_LTZ '2020-01-01 13:24:35.123456789')}} contains:
{code:sql}
Project [hour(cast(TimestampNanosVal(1577913875123456, 789) as timestamp), ...)
AS hour(TimestampNanosVal(1577913875123456, 789))#x]
{code}
instead of a readable, round-trippable literal.
This was raised during the review of [PR
#56368|https://github.com/apache/spark/pull/56368].
h2. Expected
Add explicit cases for the nanosecond timestamp types so that the formatting is
consistent with the microsecond timestamp types:
* {{Literal.toString}} renders the value as a formatted timestamp string with up
to 9 fractional digits.
* {{Literal.sql}} emits typed literals, e.g.
{{TIMESTAMP_NTZ '2018-02-14 12:58:59.123456789'}} /
{{TIMESTAMP_LTZ '2020-01-01 13:24:35.123456789'}}.
Also review the other {{value}}/{{dataType}} match sites in {{Literal}} (e.g.
{{jsonFields}}, {{default}}, codegen) for the same missing nanos cases.
h2. Scope
* {{sql/catalyst/.../expressions/literals.scala}}: {{Literal.toString}} and
{{Literal.sql}} (and any related match sites).
* A formatter producing nanosecond precision for the new types.
h2. Tests
* Unit tests for {{Literal.toString}} / {{Literal.sql}} over
{{TIMESTAMP_NTZ(p)}} / {{TIMESTAMP_LTZ(p)}} for {{p}} in {{[7, 9]}}.
* Update affected golden files (e.g. {{timestamp-ltz-nanos.sql.out}}) once the
formatting changes.
h2. Notes
This is a follow-up/cleanup item under the nanosecond timestamp preview
umbrella (SPARK-56822) and is independent of the HOUR/MINUTE/SECOND support
added in SPARK-57315.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]