Max Gekk created SPARK-57823:
--------------------------------
Summary: Support ORC predicate pushdown for nanosecond-precision
timestamps
Key: SPARK-57823
URL: https://issues.apache.org/jira/browse/SPARK-57823
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.3.0
Reporter: Max Gekk
This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond
precision).
h2. Problem
{{OrcFilters.getPredicateLeafType}} / {{castLiteralValue}}
(datasources/orc/OrcFilters.scala ~L142-179) have no {{AnyTimestampNanoType}}
arm and fall through to {{throw
QueryExecutionErrors.unsupportedOperationForDataTypeError(dataType)}}. Because
{{convertibleFilters}} probes convertibility by building the search argument, a
filter on a nanosecond column throws when ORC pushdown runs.
h2. Goal
Convert predicates on nanosecond timestamp columns to ORC search arguments (or
safely skip them) instead of throwing.
h2. Scope
Add nanosecond arms to {{getPredicateLeafType}} and {{castLiteralValue}}, using
the ORC timestamp representation consistent with the ORC nanosecond read/write
(SPARK-57455).
h2. Non-goals
Footer MIN/MAX aggregate pushdown is NOT included - {{AggregatePushDownUtils}}
(~L69-87) excludes all timestamp types (microsecond and nanosecond alike), so
it is not a nanosecond-specific parity gap.
h2. Acceptance criteria
* Filters on nanosecond ORC columns push down (or skip) without error; results
correct.
h2. Testing
{{OrcFilterSuite}}; ORC nanosecond query tests.
h2. Dependencies
None - independent (ORC datasource support resolved in SPARK-57455).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]