Zhenxiao Luo created PARQUET-374:
------------------------------------

             Summary: Add api to read dictionary from each column chunk for 
predicate pushdown
                 Key: PARQUET-374
                 URL: https://issues.apache.org/jira/browse/PARQUET-374
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-mr
            Reporter: Zhenxiao Luo
            Assignee: Zhenxiao Luo


Parquet files's dictionary could be used for predicate pushdown
eg.
SQL query:
select * from table where column = 10;
could skip reading the whole row group if the dictionary for column has values 
[5, 11, 17, 20]
This could save IO and improve performance.

We implemented predicate pushdown using dictionary in Presto for parquet files, 
and benchmark shows up to 40X speedup for selective queries.

Need to add an api to ParquetFileReader, so that it returns dictionaries for 
requested columns.
If the column is not dictionary encoded in this row group, return null.
If the not all column pages are dictionary encoded in this row group, return 
null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to