[ 
https://issues.apache.org/jira/browse/PARQUET-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved PARQUET-374.
-------------------------------
    Resolution: Won't Fix

I'm marking this as "Won't fix" because PARQUET-384 includes the proposed API 
for accessing dictionaries.

> Add api to read dictionary from each column chunk for predicate pushdown
> ------------------------------------------------------------------------
>
>                 Key: PARQUET-374
>                 URL: https://issues.apache.org/jira/browse/PARQUET-374
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Zhenxiao Luo
>            Assignee: Zhenxiao Luo
>
> Parquet files's dictionary could be used for predicate pushdown
> eg.
> SQL query:
> select * from table where column = 10;
> could skip reading the whole row group if the dictionary for column has 
> values [5, 11, 17, 20]
> This could save IO and improve performance.
> We implemented predicate pushdown using dictionary in Presto for parquet 
> files, and benchmark shows up to 40X speedup for selective queries.
> Need to add an api to ParquetFileReader, so that it returns dictionaries for 
> requested columns.
> If the column is not dictionary encoded in this row group, return null.
> If the not all column pages are dictionary encoded in this row group, return 
> null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to