Max Gekk created SPARK-57161:
--------------------------------
Summary: Convert nanosecond-capable timestamp types and literals
between proto and Catalyst in Spark Connect
Key: SPARK-57161
URL: https://issues.apache.org/jira/browse/SPARK-57161
Project: Spark
Issue Type: Sub-task
Components: Connect, SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
h2. What
Implement the proto <-> Catalyst conversion for {{TimestampNTZNanosType(p)}} and
{{TimestampLTZNanosType(p)}} (p in [7, 9]) in Spark Connect's shared
{{connect-common}}
converters, so the protocol messages added in the proto sub-task become usable
by both the
JVM Connect client and the Connect server.
Parent: SPARK-56822. Depends on the Spark Connect protocol sub-task (proto
definitions for
the nanos timestamp data types and literals).
h2. Why
The proto sub-task only adds wire surface; nothing yet maps those messages
to/from Catalyst
{{DataType}} and {{Literal}}. Until these converters exist, schema responses,
casts, UDF
input/output types, and literal expressions containing nanosecond timestamps
fail in
Connect with {{CONNECT_INVALID_PLAN.DATA_TYPE_UNSUPPORTED_*}}. These converters
are shared
by the JVM client and the server, so this unblocks both at once.
The Catalyst physical value is
{{org.apache.spark.unsafe.types.TimestampNanosVal}}
({{epochMicros: Long}} + {{nanosWithinMicro: Short}} in [0, 999]).
h2. Scope
Data type conversion
* {{sql/connect/common/.../DataTypeProtoConverter.scala}}: both directions
({{toCatalystType}} and {{toConnectProtoType}}) for the new
{{TIMESTAMP_NTZ_NANOS}} /
{{TIMESTAMP_LTZ_NANOS}} kinds, reading/writing {{precision}}.
* Prefer the Types Framework path: add {{TimestampNTZNanosTypeConnectOps}} /
{{TimestampLTZNanosTypeConnectOps}} (mirroring {{TimeTypeConnectOps}}) and
register them in
{{sql/connect/common/.../types/ops/ConnectTypeOps.scala}} ({{apply}},
{{opsForKindCase}}).
* {{sql/connect/common/.../ProtoDataTypes.scala}}: add builders only if needed
(parameterized
types typically build inline, like {{TimeType}}).
Literal conversion
* {{sql/connect/common/.../LiteralValueProtoConverter.scala}}: outbound
({{toLiteralProtoBuilder}} / {{toLiteralProtoWithType}}) and inbound
({{getScalaConverter}}, {{isCompatible}}, {{getProtoDataType}}, {{toDataType}})
handling for
the new literal kinds, encoding/decoding {{epochMicros}} + {{nanosWithinMicro}}
+ precision.
* Register literal hooks in {{ConnectTypeOps}} ({{toLiteralProtoForValue}},
{{literalCaseToKindCase}}).
Server-side literal -> Catalyst prerequisite
* {{sql/catalyst/.../CatalystTypeConverters.scala}}: add cases for the two
types (via the
{{TypeOps}} registration) so {{LiteralExpressionProtoConverter}} can build
Catalyst literals
from the decoded values. ({{LiteralExpressionProtoConverter}} itself needs no
type-specific
change.)
Guard rails
* Honor {{spark.sql.timestampNanosTypes.enabled}} on the conversion path,
consistent with the
Catalyst DDL parser ({{DataTypeErrors.checkTimestampNanosTypesEnabled()}}).
h2. Out of scope
* Proto message definitions and stub regeneration (separate proto sub-task;
this depends on
it).
* Arrow IPC type mapping and Arrow encoders (separate sub-tasks).
* PySpark client conversion (separate sub-task).
h2. How this changes tested
* New round-trip tests: Catalyst {{DataType}} -> proto -> Catalyst for NTZ and
LTZ nanos
types across precisions 7-9.
* Literal round-trip: Catalyst {{Literal}} -> proto {{Expression.Literal}} ->
Catalyst,
asserting {{epochMicros}} / {{nanosWithinMicro}} / precision are preserved,
including
boundary values and pre-epoch instants; verify NTZ vs LTZ stay distinct via
{{isCompatible}}.
* Negative test: conversion is rejected when
{{spark.sql.timestampNanosTypes.enabled=false}}.
h2. Does this introduce any user-facing change
No. The types remain gated behind {{spark.sql.timestampNanosTypes.enabled}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]