metadata based predicate pushdown

Mohit Jaggi Fri, 06 Feb 2015 06:37:12 -0800

Hi Parquet Developers,
I have a use case where I may repeatedly (but from different processes) “query” 
a large parquet file for specific rows. The query is a filter on one of the 
columns and that column is just an increasing integer(e.g. 1, 2, 3, 4…). If I 
naively use predicate pushdown, the whole file will be scanned for every query, 
right? But there is enough metadata to allow me to skip “pages” and “row 
groups" that don’t have a match. Is there an API that I can use to skip over 
“row groups” and “pages” and scan only the pages that have the row I am looking 
for? I saw references to “metadata based predicate pushdown” and “indexes in 
parquet 2.0”, so I guess such APIs do exist.


Thanks for your help,
Mohit.

metadata based predicate pushdown

Reply via email to