Max Gekk created SPARK-57285:
--------------------------------
Summary: Route nanosecond timestamp cast-to-string through the
Types Framework in both interpreted and codegen paths
Key: SPARK-57285
URL: https://issues.apache.org/jira/browse/SPARK-57285
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. Background
SPARK-57256 implemented {{CAST(TIMESTAMP_NTZ(p) / TIMESTAMP_LTZ(p) AS STRING)}}
for p in [7, 9]. The formatting currently lives in {{ToStringBase}} (alongside
the microsecond timestamp types): the interpreted path explicitly bypasses
{{TypeApiOps}}, and the codegen path inlines {{TimestampFormatter.formatNanos}}
/ {{formatWithoutTimeZoneNanos}}. This was done because the Types Framework
{{TypeApiOps.format(v)}} is zone-less and cannot render LTZ in the session time
zone, so it deliberately still raises
{{UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING}} for the zone-less callers.
This leaves nanosecond cast-to-string as a one-off integration outside the
framework, which is inconsistent with the SPIP direction of wiring the new
types through the centralized {{TypeOps}} / {{TypeApiOps}} (see SPARK-57101 /
SPARK-57207).
h2. Goal
Make the Types Framework the single integration point for nanosecond timestamp
cast-to-string, for both the interpreted and codegen paths, while producing the
same output as SPARK-57256 (zone-aware LTZ, zone-independent NTZ, precision
flooring, trailing-zero trimming).
h2. Proposed approach
* Interpreted path: extend the framework formatting hook with the session zone
(e.g. an optional {{zoneId}} parameter on {{format}} / {{formatUTF8}}), and
implement zone-aware formatting in {{TimestampNTZNanosTypeApiOps}} /
{{TimestampLTZNanosTypeApiOps}} using the sql/api {{TimestampFormatter}}
({{formatWithoutTimeZoneNanos}} for NTZ, {{formatNanos}} with {{zoneId}} for
LTZ). Thread {{ToStringBase}}'s {{zoneId}} into the dispatch, then remove the
{{castToStringDefault}} nanos cases and the current {{TypeApiOps}} bypass.
* Codegen path: {{TypeApiOps}} has no codegen hook today (each type is
hand-written in {{ToStringBase.castToStringCode}}). Add a framework codegen
hook (a method that emits the format snippet), or have {{castToStringCode}}
emit a runtime call into the ops reference object passing the {{zoneId}}
literal; then drop the inlined {{formatNanos}} cases.
* Zone-less callers: reconcile {{format()}} / {{toSQLValue()}} (EXPLAIN,
SQL-literal rendering). NTZ needs no zone and can format directly; LTZ without
a session zone keeps raising (or uses a documented default). Update
{{TimestampNanosTypeOpsSuite}} accordingly.
h2. Out of scope
* The microsecond timestamp types ({{TIMESTAMP}} / {{TIMESTAMP_NTZ}}), which
remain handled inline in {{ToStringBase}}.
* Any change to the rendered string output: this is a refactor with no
user-facing behavior change.
h2. Testing
Existing {{CastWithAnsiOnSuite}} / {{CastWithAnsiOffSuite}},
{{ToPrettyStringSuite}}, {{TimestampNanosRowSuite}}, and the {{cast.sql}}
golden files must stay green unchanged; add framework-level coverage for the
new zone-aware {{format}} hook in both eval modes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]