[
https://issues.apache.org/jira/browse/SPARK-57386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57386:
-----------------------------------
Labels: pull-request-available (was: )
> Render nanosecond timestamp types in HiveResult through the Types Framework
> ---------------------------------------------------------------------------
>
> Key: SPARK-57386
> URL: https://issues.apache.org/jira/browse/SPARK-57386
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. Background
> The nanosecond timestamp types {{TIMESTAMP_NTZ(p)}} and {{TIMESTAMP_LTZ(p)}}
> (preview
> feature under SPARK-56822) are implemented solely through the Types
> Framework. External-value
> rendering for the framework is centralized in {{TypeApiOps.formatExternal}},
> which already backs
> Row JSON ({{Row.json}} / {{Row.prettyJson}}).
> {{HiveResult.toHiveString}} dispatches through the framework first
> ({{TypeApiOps(dt).flatMap(_.formatExternal(value, nested))}}) and falls back
> to the legacy
> {{toHiveStringDefault}}. However, the nanos ops deliberately override the
> two-arg
> {{formatExternal(value, nested)}} to return {{None}}, so HiveResult instead
> renders nanos through
> inline pattern-matching in {{toHiveStringDefault}}. That duplicates the
> formatter logic and was
> documented in code as a temporary split "until nanos external rendering is
> unified across the
> zone-less (Row JSON) and zone-aware (Hive) paths".
> h2. Goal
> Unify nanosecond timestamp rendering in HiveResult onto the Types Framework,
> and remove the inline
> duplicate. The nanos types are a Types Framework feature and must NOT be
> supported in HiveResult
> when the framework is disabled. They are gated by
> {{timestampNanosTypesEnabled = timestampNanosTypes.enabled &&
> types.framework.enabled}}, so a
> nanos column cannot exist while the framework is off; the inline cases are
> therefore dead code in
> that mode and redundant when the framework is on.
> h2. Changes
> * {{TimestampNanosTypeApiOps}}: remove the {{formatExternal(value, nested) =
> None}} override so the
> Hive path shares each subclass's single-arg {{formatExternal}} renderer
> (the same one Row JSON
> uses). {{nested}} does not affect timestamp formatting.
> * {{HiveResult.toHiveStringDefault}}: remove the inline
> {{TimestampLTZNanosType}} /
> {{TimestampNTZNanosType}} cases. The legacy path keeps no nanos handling,
> so a nanos value that
> somehow reaches it (only possible with the framework off, which the gating
> forbids) is
> unsupported rather than silently rendered.
> * {{TypeApiOps}}: update the two-arg {{formatExternal}} scaladoc to reflect
> that Hive now shares the
> single-arg renderer.
> h2. Non-goals / notes
> * {{TIME}} already renders through the framework when it is enabled (its
> single-arg
> {{formatExternal}} returns a value and the two-arg overload delegates to
> it); no change. The
> inline {{LocalTime}} case remains as the framework-disabled fallback, since
> {{TimeType}} is GA and
> exists independently of the framework flag.
> * No user-facing output change: nanos Hive output is identical (zone-aware
> LTZ, zone-independent
> NTZ, precision flooring, trailing-zero trimming). Existing
> {{HiveResultSuite}} "SPARK-57257" tests
> cover precision 7/8/9, pre-1970 epochs, nested arrays/maps/structs, NULLs,
> and session-zone vs
> zone-independent rendering, and now exercise the framework path.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]