Max Gekk created SPARK-57813:
--------------------------------

             Summary: Support nanosecond-precision timestamps as file-source 
partition columns
                 Key: SPARK-57813
                 URL: https://issues.apache.org/jira/browse/SPARK-57813
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond 
precision).

h2. Problem
In {{PartitioningUtils}} both partition paths are microsecond-only:
* {{inferPartitionColumnValue}} (datasources/PartitioningUtils.scala ~L498-510) 
only tries {{conf.timestampType}} (micro {{TimestampType}} / 
{{TimestampNTZType}});
* {{castPartValueToDesiredType}} (~L565-575) gates on {{AnyTimestampType}} 
(which excludes nanos) and otherwise throws "Unsupported partition type".
So a nanosecond timestamp column cannot be used as a partition column even with 
an explicit schema.

h2. Goal
Allow nanosecond timestamps as partition columns - cast partition string values 
to the declared nanosecond type (explicit schema), and optionally infer 
nanosecond precision from partition values.

h2. Scope
Extend {{castPartValueToDesiredType}} to handle {{AnyTimestampNanoType}} via 
{{Cast}}; extend {{inferPartitionColumnValue}} to consider nanosecond precision 
when inferring.

h2. Acceptance criteria
* Reading a partitioned path with a nanosecond partition column (explicit 
schema) works; partition pruning on nanosecond partition values works.

h2. Testing
{{ParquetPartitionDiscoverySuite}} / {{FileIndexSuite}}-style tests for 
nanosecond partition columns.

h2. Dependencies
None - independent (builds on resolved casts).




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to