Max Gekk created SPARK-57101:
--------------------------------

             Summary: Register nanosecond timestamp types in the Types 
Framework (server-side)
                 Key: SPARK-57101
                 URL: https://issues.apache.org/jira/browse/SPARK-57101
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


h3. Summary

Register TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) in 
the Spark SQL Types Framework (SPARK-53504) for server-side (catalyst) 
operations. Logical types and the physical row layer already exist 
(SPARK-56876, SPARK-56981); today these types are wired only through legacy 
dispatch in PhysicalDataType, Literal, InternalRow, and codegen. This issue 
centralizes that wiring behind TypeOps when spark.sql.types.framework.enabled 
is true.

This issue covers physical representation, literals, row accessors, and codegen 
class selection only. java.time conversion, Dataset encoders, Connect proto, 
Arrow, and cast formatting are out of scope and will be handled in follow-up 
issues after SPARK-57033 and related work land.

h3. Background

* Parent SPIP: SPARK-56822 (Timestamps with nanosecond precision)
* Types Framework: SPARK-53504; reference implementation is TimeTypeOps / 
TimeTypeApiOps
* Merged foundation:
** SPARK-56876 — logical types TimestampNTZNanosType / TimestampLTZNanosType
** SPARK-56981 — physical value TimestampNanosVal, 
PhysicalTimestampNTZNanosType / PhysicalTimestampLTZNanosType, InternalRow and 
UnsafeRow accessors (PR #56059)
* Internal representation: epochMicros (long) + nanosWithinMicro (short, 
0–999), stored as TimestampNanosVal in rows

h3. What to do

*Add TypeOps implementations (sql/catalyst)*

* Create TimestampNTZNanosTypeOps and TimestampLTZNanosTypeOps (shared base for 
common logic), following the TimeTypeOps pattern.
* Register both in TypeOps.apply() — single registration point alongside 
TimeType.

*Implement TypeOps methods using existing 56981 behavior:*

|| Method || Behavior ||
| getPhysicalType | PhysicalTimestampNTZNanosType or 
PhysicalTimestampLTZNanosType |
| getJavaClass | classOf[TimestampNanosVal] |
| getRowWriter | setTimestampNTZNanos / setTimestampLTZNanos on InternalRow |
| getDefaultLiteral | Literal.create(TimestampNanosVal.ZERO, type) |
| getJavaLiteral | Java literal for codegen (e.g. TimestampNanosVal.ZERO or 
fromParts) |
| getMutableValue | Mutable holder for TimestampNanosVal in SpecificInternalRow 
(new MutableTimestampNanos or equivalent; avoid unnecessary MutableAny 
fallback) |

*Add minimal TypeApiOps stubs (sql/api)*

* Create TimestampNTZNanosTypeApiOps and TimestampLTZNanosTypeApiOps registered 
in TypeApiOps.apply().
* TimestampNTZNanosTypeOps / TimestampLTZNanosTypeOps extend the corresponding 
ApiOps class and TypeOps (same pattern as TimeTypeOps extends TimeTypeApiOps).
* format / formatUTF8 / toSQLValue: interim implementation acceptable (e.g. 
epoch-micros-based display or TimestampNanosVal.toString) until dedicated FSP 
formatters exist in a follow-up issue.
* getEncoder: not in scope for this issue.

*Integration points (automatic when TypeOps returns Some)*

These call sites already delegate to TypeOps(dt).map(...).getOrElse(legacy); no 
per-call-site edits should be required beyond registration:

* PhysicalDataType.apply
* Literal.default
* InternalRow.getWriter
* CodeGenerator / EncoderUtils Java class for codegen
* SpecificInternalRow mutable column values

*Feature flag*

* All registration is gated by spark.sql.types.framework.enabled (same as 
TimeType).
* When the flag is false, behavior must remain identical to current legacy 
paths.

h3. Tests

* With spark.sql.types.framework.enabled=true:
** PhysicalDataType(TimestampNTZNanosType(9)) and LTZ variant return the 
correct physical types (not UninitializedPhysicalType).
** Literal.default matches TimestampNanosVal.ZERO for both nanos types.
** InternalRow.getWriter roundtrip: set and read via accessor for NTZ and LTZ.
** SpecificInternalRow update/read for nanos columns.
* With the flag false: regression tests confirm no behavior change vs master 
legacy paths.
* Framework-on vs framework-off equivalence tests for the operations above.

h3. Acceptance criteria

* TypeOps(TimestampNTZNanosType(p)) and TypeOps(TimestampLTZNanosType(p)) 
return non-empty ops when spark.sql.types.framework.enabled=true, for p in {7, 
8, 9}.
* Listed integration points use TypeOps implementations and match legacy 
behavior.
* spark.sql.types.framework.enabled=false preserves current behavior.
* No change to UnsafeRow layout, TimestampNanosRowValues, or microsecond 
TimestampType / TimestampNTZType behavior.

h3. Out of scope

* CatalystTypeConverters and java.time roundtrip (SPARK-57033)
* SerializerBuildHelper / DeserializerBuildHelper and RowEncoder encoders
* ConnectTypeOps and Connect proto literals
* Arrow type mapping and ArrowFieldWriter
* PySpark conversion (EvaluatePython)
* Cast matrix, Parquet read/write, ColumnVector / vectorized Parquet
* Physical ordering, compare, and hash for nanos types
* Removing legacy branches from PhysicalDataType.applyDefault (optional cleanup 
in a later issue)

h3. Depends on

* SPARK-56981 (physical row layer and TimestampNanosVal)

h3. References

* SPARK-56822 — parent SPIP
* SPARK-53504 — Types Framework
* Precedent: org.apache.spark.sql.catalyst.types.ops.TimeTypeOps
* Physical value: org.apache.spark.unsafe.types.TimestampNanosVal



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to