[ 
https://issues.apache.org/jira/browse/SPARK-56981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-56981:
-----------------------------------
    Labels: pull-request-available  (was: )

> Add physical representation and UnsafeRow support for nanosecond-capable 
> timestamp types
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-56981
>                 URL: https://issues.apache.org/jira/browse/SPARK-56981
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>              Labels: pull-request-available
>
> h3. Summary
> [PR #55952|https://github.com/apache/spark/pull/55952] / SPARK-56876 added 
> _logical_ types {{TimestampNTZNanosType(p)}} and {{TimestampLTZNanosType(p)}} 
> (p ∈ [7, 9]) and JSON metadata. They still map to 
> {{UninitializedPhysicalType}} in {{PhysicalDataType.apply}}, so the engine 
> cannot store or access values in {{InternalRow}} / {{UnsafeRow}}.
> This issue delivers the _minimum_ physical layer aligned with the merged SPIP 
> model: *epoch microseconds (8 bytes) + nanoseconds within the microsecond 
> (0–999, 2 bytes)* — see {{defaultSize = 10}} on the logical types. One shared 
> unsafe value representation at the row layer is fine for both NTZ and LTZ 
> nanos types; semantic differences stay in logical/SQL layers.
> This is the *unblocker* for downstream work (cast, Parquet, expressions). It 
> is intentionally small: no SQL parser, no SQLConf preview, no casts, no 
> Parquet, no {{TypeOps}} / Types Framework requirement.
> _Ordering / compare / hash_ for these types is *out of scope* and will be 
> tracked in a separate follow-up issue.
> h3. What to do
> *common/unsafe*
> * Add {{org.apache.spark.unsafe.types.TimestampNTZNanos}} (name as 
> implemented): immutable value with {{long}} epoch micros + {{short}} 
> nanos-in-micro ∈ [0, 999]; {{equals}} / {{hashCode}}.
> *PhysicalDataType*
> * Add {{PhysicalTimestampNanosType}} with {{InternalType}} = the unsafe value 
> class.
> * Register {{TimestampNTZNanosType}} and {{TimestampLTZNanosType}} in 
> {{PhysicalDataType.applyDefault}} (no {{UninitializedPhysicalType}} 
> fall-through).
> *InternalRow*
> * Add get/set accessors on {{GenericInternalRow}} (and wiring in 
> {{InternalRow}} accessor dispatch) for the new physical type.
> *UnsafeRow*
> * Store values using the same pattern as {{PhysicalCalendarIntervalType}} 
> (non-fixed field: pointer in the 8-byte word + fixed payload), since 10 
> logical bytes do not fit a single primitive word.
> * Implement read and write on {{UnsafeRow}}; update 
> {{UnsafeRow.isFixedLength}} / size estimation if needed.
> *Codegen / getters*
> * {{SpecializedGettersReader}} and {{CodeGenerator}} read path for 
> {{PhysicalTimestampNanosType}}; write path included if required for roundtrip 
> tests or projection writers.
> *Literals*
> * Extend {{Literal}} validation in {{literals.scala}} to accept the unsafe 
> value type for nanos timestamp physical type.
> h3. Tests
> * {{DataTypeSuite}}: {{PhysicalDataType(TimestampNTZNanosType(p))}} and LTZ 
> variant are not {{UninitializedPhysicalType}}; {{defaultSize}} remains 10.
> * New or extended suite: {{InternalRow}} set/get roundtrip for non-null and 
> null.
> * {{UnsafeRow}} write/read roundtrip for a struct with nanos timestamp 
> column(s).
> * Regression: microsecond {{TimestampType}} / {{TimestampNTZType}} unchanged.
> h3. Acceptance criteria
> * {{PhysicalDataType.apply}} returns a concrete physical type for 
> {{TimestampNTZNanosType}} and {{TimestampLTZNanosType}} for all valid p ∈ [7, 
> 9].
> * Values can be written to and read from {{UnsafeRow}} and 
> {{GenericInternalRow}} without falling through to uninitialized physical type 
> or generic unsupported-physical-type failures in tests.
> * Codegen and interpreted getters can read a bound column of this physical 
> type in a minimal projection test.
> * No change to behavior of {{TimestampType}}, {{TimestampNTZType}}, or 
> existing microsecond storage.
> * Downstream issues (parser, SQLConf, cast, Parquet) can depend on this issue 
> and assume the SPIP composite row layout.
> h3. References
> * Precedent: {{PhysicalCalendarIntervalType}} + {{CalendarInterval}} unsafe 
> type



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to