Max Gekk created SPARK-57822:
--------------------------------
Summary: Support Parquet predicate pushdown for
nanosecond-precision timestamps
Key: SPARK-57822
URL: https://issues.apache.org/jira/browse/SPARK-57822
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond
precision).
h2. Problem
{{ParquetFilters}} (datasources/parquet/ParquetFilters.scala ~L149-152 and the
filter builders) only defines {{ParquetTimestampMicrosType}} /
{{ParquetTimestampMillisType}}; there is no {{TIMESTAMP(NANOS)}} arm, so
predicates on nanosecond columns are never converted and every row group is
scanned (silent, no error).
h2. Goal
Convert equality/range predicates on nanosecond timestamp columns into Parquet
filters using epoch-nanos, enabling row-group skipping.
h2. Scope
Add a nanosecond timestamp type + value conversion in {{ParquetFilters}}
(matching the file encoding written by {{TimestampNanosParquetOps}}); wire it
into {{createFilter}}.
h2. Acceptance criteria
* Filters on nanosecond columns are pushed and prune row groups; results
identical to no-pushdown.
h2. Testing
{{ParquetFilterSuite}}; {{ParquetTimestampNanosSuite}}.
h2. Dependencies
None - independent of the vectorized-read sub-task (separate code path).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]