[GitHub] [incubator-iceberg] rdblue commented on issue #193: Support Page Skipping in Iceberg Parquet Reader

GitBox Thu, 19 Sep 2019 12:56:49 -0700

rdblue commented on issue #193: Support Page Skipping in Iceberg Parquet Reader
URL: 
https://github.com/apache/incubator-iceberg/issues/193#issuecomment-533285229
 
 
   We would need to implement something to take advantage of the column indexes 
in some engines. Any client using an Iceberg table uses its own readers for 
Parquet data. Spark readers are in the iceberg-spark module and use a different 
read path to construct records than the one built into Parquet, so we would 
need to add support for page indexes there.
   
   In the long term, we want this optimization to be part of the vectorized 
reads because we think that reading first into Arrow (vectorized) and then 
converting to an in-memory row format is going to be faster than deserializing 
directly to the row format. That means we should invest in adding page skipping 
to the Parquet to Arrow conversion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-iceberg] rdblue commented on issue #193: Support Page Skipping in Iceberg Parquet Reader

Reply via email to