Max Gekk created SPARK-57256:
--------------------------------

             Summary: Cast nanosecond-precision timestamps to string
                 Key: SPARK-57256
                 URL: https://issues.apache.org/jira/browse/SPARK-57256
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. What

Implement casting of the nanosecond-precision timestamp types 
{{TIMESTAMP_NTZ(p)}} ({{TimestampNTZNanosType}}) and {{TIMESTAMP_LTZ(p)}} 
({{TimestampLTZNanosType}}), {{p}} in [7, 9], to {{STRING}}.

Casting is implemented in {{ToStringBase}} (mixed into {{Cast}}), so this 
change also fixes {{ToPrettyString}} (and therefore {{Dataset.show()}}) for 
these types via the shared base.

The change wires the SPARK-57162 formatter methods into the existing 
cast-to-string paths (interpreted and codegen):
* {{TimestampLTZNanosType(p)}} -> {{TimestampFormatter.formatNanos(v, p)}} 
(renders in the session time zone).
* {{TimestampNTZNanosType(p)}} -> 
{{TimestampFormatter.formatWithoutTimeZoneNanos(v, p)}} (zone-independent, UTC 
wall-clock grid).

The fractional-second precision {{p}} is taken from the source type; sub-{{p}} 
digits are floored and trailing zeros are trimmed, consistent with the 
microsecond cast path (both use {{FractionTimestampFormatter}}).

{{Cast.needsTimeZone}} is extended so that {{TimestampLTZNanosType -> 
StringType}} resolves the session time zone (mirroring {{TimestampType -> 
StringType}}); the NTZ variant does not need a time zone.

h2. Why

Today {{Cast}} permits these casts at analysis time (the generic {{(_, 
StringType)}} rule), but at runtime the nanosecond types have no dedicated case 
in {{ToStringBase}} and fall through to the default {{String.valueOf(...)}} 
branch, producing the internal form {{TimestampNanosVal(epochMicros, 
nanosWithinMicro)}} instead of a proper SQL timestamp string. Producing a 
correct textual representation is a prerequisite for nanosecond support in 
expressions, {{SHOW}}/pretty output, and downstream text-based sinks.

h2. Example

With {{spark.sql.timestampNanosTypes.enabled=true}}:
{code:sql}
SELECT CAST(ts AS STRING);
-- TIMESTAMP_NTZ(9) value 2020-01-01 00:00:00.123456789
--   before: TimestampNanosVal(1577836800000000, 789)
--   after:  2020-01-01 00:00:00.123456789
{code}

h2. Behavior change

User-facing only when {{spark.sql.timestampNanosTypes.enabled=true}}; these 
types are not available otherwise. Casting to string never fails, so ANSI and 
non-ANSI modes behave identically.

h2. Dependency

Builds on SPARK-57162 (nanosecond-aware {{TimestampFormatter}}), which provides 
{{formatNanos}} / {{formatWithoutTimeZoneNanos}}.

h2. How tested

New cases in {{CastSuiteBase}} (run under both ANSI on/off; {{checkEvaluation}} 
exercises the interpreted and codegen paths): precision 7/8/9, trailing-zero 
trimming, {{nanosWithinMicro}} 0 and 999, LTZ time-zone shift under a non-UTC 
session zone vs. NTZ remaining unshifted, pre-epoch and year-9999 boundaries, 
and null input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to