Hi Parquet Developers, I have a use case where I may repeatedly (but from different processes) “query” a large parquet file for specific rows. The query is a filter on one of the columns and that column is just an increasing integer(e.g. 1, 2, 3, 4…). If I naively use predicate pushdown, the whole file will be scanned for every query, right? But there is enough metadata to allow me to skip “pages” and “row groups" that don’t have a match. Is there an API that I can use to skip over “row groups” and “pages” and scan only the pages that have the row I am looking for? I saw references to “metadata based predicate pushdown” and “indexes in parquet 2.0”, so I guess such APIs do exist.
Thanks for your help, Mohit.
