Max Gekk created SPARK-57813:
--------------------------------
Summary: Support nanosecond-precision timestamps as file-source
partition columns
Key: SPARK-57813
URL: https://issues.apache.org/jira/browse/SPARK-57813
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond
precision).
h2. Problem
In {{PartitioningUtils}} both partition paths are microsecond-only:
* {{inferPartitionColumnValue}} (datasources/PartitioningUtils.scala ~L498-510)
only tries {{conf.timestampType}} (micro {{TimestampType}} /
{{TimestampNTZType}});
* {{castPartValueToDesiredType}} (~L565-575) gates on {{AnyTimestampType}}
(which excludes nanos) and otherwise throws "Unsupported partition type".
So a nanosecond timestamp column cannot be used as a partition column even with
an explicit schema.
h2. Goal
Allow nanosecond timestamps as partition columns - cast partition string values
to the declared nanosecond type (explicit schema), and optionally infer
nanosecond precision from partition values.
h2. Scope
Extend {{castPartValueToDesiredType}} to handle {{AnyTimestampNanoType}} via
{{Cast}}; extend {{inferPartitionColumnValue}} to consider nanosecond precision
when inferring.
h2. Acceptance criteria
* Reading a partitioned path with a nanosecond partition column (explicit
schema) works; partition pruning on nanosecond partition values works.
h2. Testing
{{ParquetPartitionDiscoverySuite}} / {{FileIndexSuite}}-style tests for
nanosecond partition columns.
h2. Dependencies
None - independent (builds on resolved casts).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]