Max Gekk created SPARK-57828:
--------------------------------

             Summary: Add vectorized Parquet reader support for 
nanosecond-precision timestamp types
                 Key: SPARK-57828
                 URL: https://issues.apache.org/jira/browse/SPARK-57828
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.3.0
            Reporter: Max Gekk


This sub-task is part of the umbrella SPARK-56822 (timestamps with nanosecond 
precision).

h2. Problem
Nanosecond Parquet READ only works via the row-based reader. 
{{TimestampNanosParquetOps.isBatchReadSupported}} 
(datasources/parquet/TimestampNanosParquetOps.scala ~L54-58) stays at the trait 
default {{false}}, and {{ParquetUtils.isBatchReadSupportedForSchema}} requires 
every column to support batch read, so any nanosecond column forces the whole 
file onto {{ParquetRowConverter}} even when vectorized reading is enabled. 
{{ParquetVectorUpdaterFactory}} (~L140-161) only has MICROS/MILLIS timestamp 
updaters. Correct but slow; Parquet WRITE + non-vectorized READ are already 
done (SPARK-57102).

h2. Goal
Read {{TIMESTAMP(NANOS)}} columns through the vectorized path, materializing 
the 16-byte {{TimestampNanosVal}} (epoch micros + nanosecond remainder) into 
the columnar batch.

h2. Scope
Implement a nanosecond updater in {{ParquetVectorUpdaterFactory}} / 
{{VectorizedColumnReader}}; flip {{isBatchReadSupported}} for the nanosecond 
ops; ensure the columnar layout matches the nanosecond {{ColumnVector}} (added 
under SPARK-57100).

h2. Acceptance criteria
* Nanosecond Parquet files read correctly with 
{{spark.sql.parquet.enableVectorizedReader=true}}, matching the row-reader 
results; no fallback for nanosecond-only schemas.

h2. Testing
{{ParquetTimestampNanosSuite}} (with the vectorized reader on), 
{{ParquetQuerySuite}}.

h2. Dependencies
None - independent (columnar layout resolved in SPARK-57100).




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to