rdblue commented on issue #193: Support Page Skipping in Iceberg Parquet Reader URL: https://github.com/apache/incubator-iceberg/issues/193#issuecomment-533285229 We would need to implement something to take advantage of the column indexes in some engines. Any client using an Iceberg table uses its own readers for Parquet data. Spark readers are in the iceberg-spark module and use a different read path to construct records than the one built into Parquet, so we would need to add support for page indexes there. In the long term, we want this optimization to be part of the vectorized reads because we think that reading first into Arrow (vectorized) and then converting to an in-memory row format is going to be faster than deserializing directly to the row format. That means we should invest in adding page skipping to the Parquet to Arrow conversion.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
