[ 
https://issues.apache.org/jira/browse/SPARK-57455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57455:
-----------------------------
    Affects Version/s: 4.3.0
                           (was: 5.0.0)

> Support nanosecond-precision timestamp types in the ORC datasource (v1 and v2)
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-57455
>                 URL: https://issues.apache.org/jira/browse/SPARK-57455
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> Umbrella: SPARK-56822 (Timestamps with nanosecond precision).
> Add read and write support for the nanosecond-capable timestamp types 
> TIMESTAMP_NTZ(p) and TIMESTAMP_LTZ(p) (p in 7-9) so this datasource reaches 
> parity with the microsecond TimestampType / TimestampNTZType. Remove the 
> SPARK-57166 rejection guardrail (supportDataType / supportsDataType) once 
> read and write are implemented and tested, and update 
> FileBasedDataSourceSuite accordingly. Cover precisions 7-9 for both NTZ and 
> LTZ.
> Scope (core + hive ORC):
> - Type mapping in OrcUtils (orcTypeDescription / toCatalystSchema): 
> TimestampLTZNanosType via native ORC timestamp (seconds+nanos, lossless); 
> TimestampNTZNanosType via ORC LONG with a catalyst attribute.
> - Write: OrcSerializer (preserve sub-microsecond nanos); Hive path via 
> HiveInspectors.wrapperFor.
> - Read non-vectorized: OrcDeserializer.
> - Read vectorized: OrcAtomicColumnVector (build TimestampNanosVal from the 
> scratch java.sql.Timestamp).
> - Guardrails: core OrcFileFormat, v2 OrcTable, hive orc OrcFileFormat.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to