[
https://issues.apache.org/jira/browse/SPARK-57257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57257:
-----------------------------------
Labels: pull-request-available (was: )
> Support nanosecond-precision timestamps in Hive results
> -------------------------------------------------------
>
> Key: SPARK-57257
> URL: https://issues.apache.org/jira/browse/SPARK-57257
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. What
> Modify {{HiveResult}} to support the nanosecond-precision timestamp types
> {{TIMESTAMP_LTZ(p)}} ({{TimestampLTZNanosType}}) and {{TIMESTAMP_NTZ(p)}}
> ({{TimestampNTZNanosType}}), {{p}} in [7, 9].
> Add cases to {{HiveResult.toHiveStringDefault}} mirroring the existing
> microsecond timestamp cases:
> * {{(i: Instant, _: TimestampLTZNanosType)}} -> render in the session time
> zone.
> * {{(l: LocalDateTime, _: TimestampNTZNanosType)}} -> render
> zone-independently.
> Both render with the nanosecond-aware {{TimestampFormatter}} (SPARK-57162) at
> the column's fractional-second precision {{p}}, flooring sub-{{p}} digits and
> trimming trailing zeros, consistent with casting these types to string.
> {{getTimeFormatters}} already constructs a {{FractionTimestampFormatter}} via
> {{TimestampFormatter.getFractionFormatter}}, which now exposes
> {{formatNanos}} / {{formatWithoutTimeZoneNanos}}.
> h2. Why
> Before the change, formatting a nanosecond timestamp column through
> {{HiveResult}} (e.g. end-to-end SQL / golden-file tests, {{spark-sql}} CLI,
> Thrift server output) hits the catch-all match and fails with a
> {{MatchError}}, analogous to the {{TimeType}} issue fixed in SPARK-51517:
> {code}
> scala.MatchError
> (2020-01-01T00:00:00.123456789Z, TimestampLTZNanosType(9)) (of class
> scala.Tuple2)
> {code}
> The existing cases at {{HiveResult.scala}} match only the microsecond
> {{TimestampType}} / {{TimestampNTZType}}, so the parameterized nanos types
> are not handled.
> h2. Does this PR introduce any user-facing change?
> It fixes the error above. After the change, nanosecond timestamp values are
> rendered as proper strings in Hive results (only reachable when
> {{spark.sql.timestampNanosTypes.enabled=true}}).
> h2. Dependency
> Builds on SPARK-57162 (nanosecond-aware {{TimestampFormatter}}).
> h2. How tested
> * New cases in {{HiveResultSuite}} covering {{TIMESTAMP_LTZ(p)}} /
> {{TIMESTAMP_NTZ(p)}} for {{p}} in [7, 9]: precision-driven fraction width,
> trailing-zero trimming, {{nanosWithinMicro}} 0 and 999, LTZ session-zone
> rendering vs. zone-independent NTZ, and nested (array/map/struct) values.
> * A golden-file end-to-end test (as SPARK-51517 added {{time.sql}}), disabled
> in {{ThriftServerQueryTestSuite}} if needed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]