Max Gekk created SPARK-57161:
--------------------------------

             Summary: Convert nanosecond-capable timestamp types and literals 
between proto and Catalyst in Spark Connect
                 Key: SPARK-57161
                 URL: https://issues.apache.org/jira/browse/SPARK-57161
             Project: Spark
          Issue Type: Sub-task
          Components: Connect, SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h2. What

Implement the proto <-> Catalyst conversion for {{TimestampNTZNanosType(p)}} and
{{TimestampLTZNanosType(p)}} (p in [7, 9]) in Spark Connect's shared 
{{connect-common}}
converters, so the protocol messages added in the proto sub-task become usable 
by both the
JVM Connect client and the Connect server.

Parent: SPARK-56822. Depends on the Spark Connect protocol sub-task (proto 
definitions for
the nanos timestamp data types and literals).

h2. Why

The proto sub-task only adds wire surface; nothing yet maps those messages 
to/from Catalyst
{{DataType}} and {{Literal}}. Until these converters exist, schema responses, 
casts, UDF
input/output types, and literal expressions containing nanosecond timestamps 
fail in
Connect with {{CONNECT_INVALID_PLAN.DATA_TYPE_UNSUPPORTED_*}}. These converters 
are shared
by the JVM client and the server, so this unblocks both at once.

The Catalyst physical value is 
{{org.apache.spark.unsafe.types.TimestampNanosVal}}
({{epochMicros: Long}} + {{nanosWithinMicro: Short}} in [0, 999]).

h2. Scope

Data type conversion
* {{sql/connect/common/.../DataTypeProtoConverter.scala}}: both directions
({{toCatalystType}} and {{toConnectProtoType}}) for the new 
{{TIMESTAMP_NTZ_NANOS}} /
{{TIMESTAMP_LTZ_NANOS}} kinds, reading/writing {{precision}}.
* Prefer the Types Framework path: add {{TimestampNTZNanosTypeConnectOps}} /
{{TimestampLTZNanosTypeConnectOps}} (mirroring {{TimeTypeConnectOps}}) and 
register them in
{{sql/connect/common/.../types/ops/ConnectTypeOps.scala}} ({{apply}}, 
{{opsForKindCase}}).
* {{sql/connect/common/.../ProtoDataTypes.scala}}: add builders only if needed 
(parameterized
types typically build inline, like {{TimeType}}).

Literal conversion
* {{sql/connect/common/.../LiteralValueProtoConverter.scala}}: outbound
({{toLiteralProtoBuilder}} / {{toLiteralProtoWithType}}) and inbound
({{getScalaConverter}}, {{isCompatible}}, {{getProtoDataType}}, {{toDataType}}) 
handling for
the new literal kinds, encoding/decoding {{epochMicros}} + {{nanosWithinMicro}} 
+ precision.
* Register literal hooks in {{ConnectTypeOps}} ({{toLiteralProtoForValue}},
{{literalCaseToKindCase}}).

Server-side literal -> Catalyst prerequisite
* {{sql/catalyst/.../CatalystTypeConverters.scala}}: add cases for the two 
types (via the
{{TypeOps}} registration) so {{LiteralExpressionProtoConverter}} can build 
Catalyst literals
from the decoded values. ({{LiteralExpressionProtoConverter}} itself needs no 
type-specific
change.)

Guard rails
* Honor {{spark.sql.timestampNanosTypes.enabled}} on the conversion path, 
consistent with the
Catalyst DDL parser ({{DataTypeErrors.checkTimestampNanosTypesEnabled()}}).

h2. Out of scope

* Proto message definitions and stub regeneration (separate proto sub-task; 
this depends on
it).
* Arrow IPC type mapping and Arrow encoders (separate sub-tasks).
* PySpark client conversion (separate sub-task).

h2. How this changes tested

* New round-trip tests: Catalyst {{DataType}} -> proto -> Catalyst for NTZ and 
LTZ nanos
types across precisions 7-9.
* Literal round-trip: Catalyst {{Literal}} -> proto {{Expression.Literal}} -> 
Catalyst,
asserting {{epochMicros}} / {{nanosWithinMicro}} / precision are preserved, 
including
boundary values and pre-epoch instants; verify NTZ vs LTZ stay distinct via 
{{isCompatible}}.
* Negative test: conversion is rejected when 
{{spark.sql.timestampNanosTypes.enabled=false}}.

h2. Does this introduce any user-facing change

No. The types remain gated behind {{spark.sql.timestampNanosTypes.enabled}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to