[ 
https://issues.apache.org/jira/browse/SPARK-42388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars updated SPARK-42388:
-------------------------
    Description: 
Parquet footer is now read twice even if there are no filters requiring 
pushdown in vectorized parquet reader.
When the NameNode is under high pressure, it will cost time to read twice. 
Actually we can avoid this unnecessary parquet footer reads and use footer 
metadata in {{{}VectorizedParquetRecordReader{}}}.

> Avoid unnecessary parquet footer reads when no filters in vectorized reader
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-42388
>                 URL: https://issues.apache.org/jira/browse/SPARK-42388
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Mars
>            Priority: Major
>
> Parquet footer is now read twice even if there are no filters requiring 
> pushdown in vectorized parquet reader.
> When the NameNode is under high pressure, it will cost time to read twice. 
> Actually we can avoid this unnecessary parquet footer reads and use footer 
> metadata in {{{}VectorizedParquetRecordReader{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to