Hi folks,

I started to write a design doc about page skipping based on the Parquet
page index
<https://github.com/apache/parquet-format/blob/master/PageIndex.md>.

Impala already writes the page index since versions 2.13 and 3.1 (
IMPALA-5842 <https://issues.apache.org/jira/browse/IMPALA-5842>).

Reading the page index and implementing the filtering is much more tricky
than writing the index, so we decided to start with a high-level doc rather
than the code itself.
Special thanks to Tim, Lars, and Csaba for their insights!

You can find the doc here, all comments are welcome:
https://docs.google.com/document/d/1D-el8njq_I-JKd3NDcW1mRXID_n0dBDKIkjWxwULVus/edit?usp=sharing

Thanks,
    Zoltan

Reply via email to