Hi folks, I started to write a design doc about page skipping based on the Parquet page index <https://github.com/apache/parquet-format/blob/master/PageIndex.md>.
Impala already writes the page index since versions 2.13 and 3.1 ( IMPALA-5842 <https://issues.apache.org/jira/browse/IMPALA-5842>). Reading the page index and implementing the filtering is much more tricky than writing the index, so we decided to start with a high-level doc rather than the code itself. Special thanks to Tim, Lars, and Csaba for their insights! You can find the doc here, all comments are welcome: https://docs.google.com/document/d/1D-el8njq_I-JKd3NDcW1mRXID_n0dBDKIkjWxwULVus/edit?usp=sharing Thanks, Zoltan