[jira] [Updated] (SPARK-36527) Implement lazy materialization for the vectorized Parquet reader

Chao Sun (Jira) Mon, 16 Aug 2021 11:32:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-36527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chao Sun updated SPARK-36527:
-----------------------------
    Description: At the moment the Parquet vectorized reader will eagerly 
decode all the columns that are in the read schema, before any filter has been 
applied to them. This is costly. Instead it's better to only materialize these 
column vectors when the data are actually needed.  (was: At the moment the 
Parquet vectorized reader will eagerly decode all the columns that are in the 
read schema, before any filter has been applied to them. This is costly. 
Instead it's better to only materialize these column vectors when the data are 
actually read.)

> Implement lazy materialization for the vectorized Parquet reader
> ----------------------------------------------------------------
>
>                 Key: SPARK-36527
>                 URL: https://issues.apache.org/jira/browse/SPARK-36527
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Chao Sun
>            Priority: Major
>
> At the moment the Parquet vectorized reader will eagerly decode all the 
> columns that are in the read schema, before any filter has been applied to 
> them. This is costly. Instead it's better to only materialize these column 
> vectors when the data are actually needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-36527) Implement lazy materialization for the vectorized Parquet reader

Reply via email to