[
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122708#comment-16122708
]
Junjie Chen commented on HIVE-17261:
------------------------------------
Actually, Hive use two deprecated parquet APIs, one is ParquetInputSplit,
another is filterRowGroup. This is because parquet introduce new dictionary
filter. The key point here is how to leverage both statistics filter and
dictionary filter, in existing code, hive explicitly apply statistic filter in
Hive side.
To apply both statistics and dictionary filter, we can either explicitly
changed filterRowGroup API or pass predicate statement through job
configuration to parquet and filter at parquet side. The patch I provide is to
pass predicate statement and skip explicitly filter at hive side.
> Hive use deprecated ParquetInputSplit constructor which blocked parquet
> dictionary filter
> -----------------------------------------------------------------------------------------
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
> Issue Type: Improvement
> Components: Database/Schema
> Affects Versions: 2.2.0
> Reporter: Junjie Chen
> Assignee: Junjie Chen
> Priority: Minor
> Attachments: HIVE-17261.2.patch, HIVE-17261.diff, HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary
> filter in parquet.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)