Max Gekk created SPARK-56981:
--------------------------------

             Summary: Add physical representation and UnsafeRow support for 
nanosecond-capable timestamp types
                 Key: SPARK-56981
                 URL: https://issues.apache.org/jira/browse/SPARK-56981
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Max Gekk
            Assignee: Max Gekk


h3. Summary

[PR #55952|https://github.com/apache/spark/pull/55952] / SPARK-56876 added 
_logical_ types {{TimestampNTZNanosType(p)}} and {{TimestampLTZNanosType(p)}} 
(p ∈ [7, 9]) and JSON metadata. They still map to {{UninitializedPhysicalType}} 
in {{PhysicalDataType.apply}}, so the engine cannot store or access values in 
{{InternalRow}} / {{UnsafeRow}}.

This issue delivers the _minimum_ physical layer aligned with the merged SPIP 
model: *epoch microseconds (8 bytes) + nanoseconds within the microsecond 
(0–999, 2 bytes)* — see {{defaultSize = 10}} on the logical types. One shared 
unsafe value representation at the row layer is fine for both NTZ and LTZ nanos 
types; semantic differences stay in logical/SQL layers.

This is the *unblocker* for downstream work (cast, Parquet, expressions). It is 
intentionally small: no SQL parser, no SQLConf preview, no casts, no Parquet, 
no {{TypeOps}} / Types Framework requirement.

_Ordering / compare / hash_ for these types is *out of scope* and will be 
tracked in a separate follow-up issue.

h3. What to do

*common/unsafe*
* Add {{org.apache.spark.unsafe.types.TimestampNTZNanos}} (name as 
implemented): immutable value with {{long}} epoch micros + {{short}} 
nanos-in-micro ∈ [0, 999]; {{equals}} / {{hashCode}}.

*PhysicalDataType*
* Add {{PhysicalTimestampNanosType}} with {{InternalType}} = the unsafe value 
class.
* Register {{TimestampNTZNanosType}} and {{TimestampLTZNanosType}} in 
{{PhysicalDataType.applyDefault}} (no {{UninitializedPhysicalType}} 
fall-through).

*InternalRow*
* Add get/set accessors on {{GenericInternalRow}} (and wiring in 
{{InternalRow}} accessor dispatch) for the new physical type.

*UnsafeRow*
* Store values using the same pattern as {{PhysicalCalendarIntervalType}} 
(non-fixed field: pointer in the 8-byte word + fixed payload), since 10 logical 
bytes do not fit a single primitive word.
* Implement read and write on {{UnsafeRow}}; update {{UnsafeRow.isFixedLength}} 
/ size estimation if needed.

*Codegen / getters*
* {{SpecializedGettersReader}} and {{CodeGenerator}} read path for 
{{PhysicalTimestampNanosType}}; write path included if required for roundtrip 
tests or projection writers.

*Literals*
* Extend {{Literal}} validation in {{literals.scala}} to accept the unsafe 
value type for nanos timestamp physical type.

h3. Tests

* {{DataTypeSuite}}: {{PhysicalDataType(TimestampNTZNanosType(p))}} and LTZ 
variant are not {{UninitializedPhysicalType}}; {{defaultSize}} remains 10.
* New or extended suite: {{InternalRow}} set/get roundtrip for non-null and 
null.
* {{UnsafeRow}} write/read roundtrip for a struct with nanos timestamp 
column(s).
* Regression: microsecond {{TimestampType}} / {{TimestampNTZType}} unchanged.

h3. Acceptance criteria

* {{PhysicalDataType.apply}} returns a concrete physical type for 
{{TimestampNTZNanosType}} and {{TimestampLTZNanosType}} for all valid p ∈ [7, 
9].
* Values can be written to and read from {{UnsafeRow}} and 
{{GenericInternalRow}} without falling through to uninitialized physical type 
or generic unsupported-physical-type failures in tests.
* Codegen and interpreted getters can read a bound column of this physical type 
in a minimal projection test.
* No change to behavior of {{TimestampType}}, {{TimestampNTZType}}, or existing 
microsecond storage.
* Downstream issues (parser, SQLConf, cast, Parquet) can depend on this issue 
and assume the SPIP composite row layout.

h3. References

* Precedent: {{PhysicalCalendarIntervalType}} + {{CalendarInterval}} unsafe type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to