[
https://issues.apache.org/jira/browse/SPARK-57256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Max Gekk reassigned SPARK-57256:
--------------------------------
Assignee: Max Gekk
> Cast nanosecond-precision timestamps to string
> ----------------------------------------------
>
> Key: SPARK-57256
> URL: https://issues.apache.org/jira/browse/SPARK-57256
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Assignee: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. What
> Implement casting of the nanosecond-precision timestamp types
> {{TIMESTAMP_NTZ(p)}} ({{TimestampNTZNanosType}}) and {{TIMESTAMP_LTZ(p)}}
> ({{TimestampLTZNanosType}}), {{p}} in [7, 9], to {{STRING}}.
> Casting is implemented in {{ToStringBase}} (mixed into {{Cast}}), so this
> change also fixes {{ToPrettyString}} (and therefore {{Dataset.show()}}) for
> these types via the shared base.
> The change wires the SPARK-57162 formatter methods into the existing
> cast-to-string paths (interpreted and codegen):
> * {{TimestampLTZNanosType(p)}} -> {{TimestampFormatter.formatNanos(v, p)}}
> (renders in the session time zone).
> * {{TimestampNTZNanosType(p)}} ->
> {{TimestampFormatter.formatWithoutTimeZoneNanos(v, p)}} (zone-independent,
> UTC wall-clock grid).
> The fractional-second precision {{p}} is taken from the source type;
> sub-{{p}} digits are floored and trailing zeros are trimmed, consistent with
> the microsecond cast path (both use {{FractionTimestampFormatter}}).
> {{Cast.needsTimeZone}} is extended so that {{TimestampLTZNanosType ->
> StringType}} resolves the session time zone (mirroring {{TimestampType ->
> StringType}}); the NTZ variant does not need a time zone.
> h2. Why
> Today {{Cast}} permits these casts at analysis time (the generic {{(_,
> StringType)}} rule), but at runtime the nanosecond types have no dedicated
> case in {{ToStringBase}} and fall through to the default
> {{String.valueOf(...)}} branch, producing the internal form
> {{TimestampNanosVal(epochMicros, nanosWithinMicro)}} instead of a proper SQL
> timestamp string. Producing a correct textual representation is a
> prerequisite for nanosecond support in expressions, {{SHOW}}/pretty output,
> and downstream text-based sinks.
> h2. Example
> With {{spark.sql.timestampNanosTypes.enabled=true}}:
> {code:sql}
> SELECT CAST(ts AS STRING);
> -- TIMESTAMP_NTZ(9) value 2020-01-01 00:00:00.123456789
> -- before: TimestampNanosVal(1577836800000000, 789)
> -- after: 2020-01-01 00:00:00.123456789
> {code}
> h2. Behavior change
> User-facing only when {{spark.sql.timestampNanosTypes.enabled=true}}; these
> types are not available otherwise. Casting to string never fails, so ANSI and
> non-ANSI modes behave identically.
> h2. Dependency
> Builds on SPARK-57162 (nanosecond-aware {{TimestampFormatter}}), which
> provides {{formatNanos}} / {{formatWithoutTimeZoneNanos}}.
> h2. How tested
> New cases in {{CastSuiteBase}} (run under both ANSI on/off;
> {{checkEvaluation}} exercises the interpreted and codegen paths): precision
> 7/8/9, trailing-zero trimming, {{nanosWithinMicro}} 0 and 999, LTZ time-zone
> shift under a non-UTC session zone vs. NTZ remaining unshifted, pre-epoch and
> year-9999 boundaries, and null input.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]