Chao Sun created SPARK-36527:
--------------------------------
Summary: Implement lazy materialization for the vectorized Parquet
reader
Key: SPARK-36527
URL: https://issues.apache.org/jira/browse/SPARK-36527
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 3.3.0
Reporter: Chao Sun
At the moment the Parquet vectorized reader will eagerly decode all the columns
that are in the read schema, before any filter has been applied to them. This
is costly. Instead it's better to only materialize these column vectors when
the data are actually read.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]