Max Gekk created SPARK-57455:
--------------------------------

             Summary: Support nanosecond-precision timestamp types in the ORC 
datasource (v1 and v2)
                 Key: SPARK-57455
                 URL: https://issues.apache.org/jira/browse/SPARK-57455
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 5.0.0
            Reporter: Max Gekk


Umbrella: SPARK-56822 (Timestamps with nanosecond precision).

Add read and write support for the nanosecond-capable timestamp types 
TIMESTAMP_NTZ(p) and TIMESTAMP_LTZ(p) (p in 7-9) so this datasource reaches 
parity with the microsecond TimestampType / TimestampNTZType. Remove the 
SPARK-57166 rejection guardrail (supportDataType / supportsDataType) once read 
and write are implemented and tested, and update FileBasedDataSourceSuite 
accordingly. Cover precisions 7-9 for both NTZ and LTZ.

Scope (core + hive ORC):
- Type mapping in OrcUtils (orcTypeDescription / toCatalystSchema): 
TimestampLTZNanosType via native ORC timestamp (seconds+nanos, lossless); 
TimestampNTZNanosType via ORC LONG with a catalyst attribute.
- Write: OrcSerializer (preserve sub-microsecond nanos); Hive path via 
HiveInspectors.wrapperFor.
- Read non-vectorized: OrcDeserializer.
- Read vectorized: OrcAtomicColumnVector (build TimestampNanosVal from the 
scratch java.sql.Timestamp).
- Guardrails: core OrcFileFormat, v2 OrcTable, hive orc OrcFileFormat.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to