[jira] [Created] (SPARK-57100) Add columnar (ColumnVector) support for nanosecond timestamp types

Max Gekk (Jira) Wed, 27 May 2026 06:52:07 -0700

Max Gekk created SPARK-57100:
--------------------------------

             Summary: Add columnar (ColumnVector) support for nanosecond 
timestamp types
                 Key: SPARK-57100
                 URL: https://issues.apache.org/jira/browse/SPARK-57100
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk



h3. Summary

SPARK-56981 added physical row storage for TimestampNTZNanosType(p) and 
TimestampLTZNanosType(p) (p in [7, 9]) via TimestampNanosVal and UnsafeRow. 
Columnar execution still cannot hold or move these values: 
ColumnVector.getTimestampNTZNanos / getTimestampLTZNanos throw 
SparkUnsupportedOperationException, and RowToColumnConverter / 
ColumnVectorUtils have no support.

This issue implements the columnar layer so ColumnarBatch can store nanosecond 
timestamps and interoperate with InternalRow / UnsafeRow (ColumnarToRow, 
RowToColumnar, whole-stage codegen paths that read column vectors).

Parquet vectorized decode (ParquetVectorUpdaterFactory, TIMESTAMP(NANOS) pages) 
is a separate follow-up that depends on this issue.

h3. Background

* Logical types and parser: SPARK-56876, SPARK-56965
* Physical / UnsafeRow layer: SPARK-56981 (merged, PR #56059)
* SPIP composite value: epochMicros (long) + nanosWithinMicro (short, 0-999)
* UnsafeRow uses a 16-byte variable-length payload; column batches should use a 
fixed struct-like layout (see below), not the UnsafeRow blob layout.

h3. Recommended column layout

Mirror CalendarInterval (multi-child column), not a single primitive column:

|| Child || Spark type || Field ||
| 0 | LongType | epochMicros |
| 1 | IntegerType | nanosWithinMicro (0-999) |

NTZ and LTZ share the same physical column layout; SQL semantics stay on the 
logical type (same pattern as row layer).

h3. What to do

*ColumnVector API (sql/catalyst)*
* Implement default getTimestampNTZNanos / getTimestampLTZNanos on ColumnVector 
using getChild(0).getLong + getChild(1).getInt (remove throw).
* WritableColumnVector: allocate two child columns for TimestampNTZNanosType / 
TimestampLTZNanosType in the constructor (like CalendarIntervalType).
* Add putTimestampNanos (or putTimestampNTZNanos / LTZ) and append paths 
writing both children.

*On-heap / off-heap vectors (sql/core)*
* OnHeapColumnVector / OffHeapColumnVector: read/write/append for nanos columns.
* ConstantColumnVector: set/get for constant nanos values.
* MutableColumnarRow: ensure setters write through to WritableColumnVector 
(getters already delegate).

*Row <-> column bridges*
* RowToColumnConverter (Columnar.scala): TimestampNanosConverter (like 
CalendarConverter) using row.getTimestampNTZNanos / LTZ.
* ColumnVectorUtils: populate and appendValue for PhysicalTimestampNTZNanosType 
/ PhysicalTimestampLTZNanosType.

*Columnar surface stubs*
* ColumnVector / ColumnarRow / ColumnarArray / ColumnarBatchRow: already 
delegate to ColumnVector; verify after base implementation.
* ColumnVector stubs that still throw UnsupportedOperationException until 
vectorized Parquet/columnar writers land may remain documented; this ticket 
focuses on read/get/put/append and row roundtrip.

*Codegen*
* CodeGenerator already emits getTimestampNTZNanos / getTimestampLTZNanos for 
columnar inputs; no change expected once ColumnVector implements getters.

h3. Tests

* Unit tests: write/read/append/null handling on OnHeapColumnVector (and 
OffHeap if enabled in tests).
* RowToColumnar -> ColumnarToRow -> UnsafeProjection roundtrip for NTZ and LTZ 
nanos types (null and non-null).
* Regression: microsecond TimestampType / TimestampNTZType column vectors 
unchanged.

h3. Acceptance criteria

* ColumnarBatch can be built from InternalRow rows containing TimestampNanosVal 
for nanos timestamp columns.
* ColumnarBatch.rowIterator() + UnsafeProjection produces UnsafeRow values 
equal to the source row for nanos columns.
* getTimestampNTZNanos / getTimestampLTZNanos on column vectors return correct 
TimestampNanosVal for batch rows.
* RowToColumnConverter no longer throws unsupportedDataTypeError for 
TimestampNTZNanosType / TimestampLTZNanosType.

h3. Unblocks

* Parquet vectorized read of TIMESTAMP(NANOS) into ColumnarBatch.
* Vectorized scan performance for nanos columns; RowToColumnarExec / 
ColumnarToRowExec in nanos pipelines.

h3. References

* Parent: SPARK-56822 (SPIP: Timestamps with nanosecond precision)
* Precedent: CalendarInterval column layout in WritableColumnVector and 
Columnar.scala
* Physical value: org.apache.spark.unsafe.types.TimestampNanosVal



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-57100) Add columnar (ColumnVector) support for nanosecond timestamp types

Reply via email to