[
https://issues.apache.org/jira/browse/HIVE-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171984#comment-14171984
]
Ferdinand Xu commented on HIVE-8122:
------------------------------------
Hi [~brocknoland], currently I'm working on this jira and did some
investigations both on parquet and hive side.
To my understand, search argument is kind of predication pushing down framework
or mechanism. Vertical and horizontal partitions are supported in the latest
parquet project already. And it's using FilterPredicate which is like
searchArgument. See
https://github.com/apache/incubator-parquet-mr/blob/0148455170be07f89bd6b9230960a6cd510c7ca6/parquet-column/src/main/java/parquet/filter2/predicate/FilterPredicate.java
I found it an issue about the current filter solutio in HIVE side. Hive is
implementing the filter pushing down by putting FILTER_EXPR_CONF_STR and
FILTER_TEXT_CONF_STR into the conf and then pass it to the ParquetIntputFormat.
However, the parquet is using FILTER_PREDICATE configuration which is
serialized with a FilterPredicate.
Is the jira filed for the purpose of enabling Filter Predicate features
provided by the parquet in the hive code? If so, maybe we can use the machinism
from parquet by creating a FilterPredicate in hive code. See
https://github.com/apache/incubator-parquet-mr/blob/5dafd127f3de7c516ce9c1b7329087a01ab2fc57/parquet-hadoop/src/main/java/parquet/hadoop/ParquetInputFormat.java#L163
Please feel free to figure out what I am wrong.
> Make use of SearchArgument classes
> ----------------------------------
>
> Key: HIVE-8122
> URL: https://issues.apache.org/jira/browse/HIVE-8122
> Project: Hive
> Issue Type: Sub-task
> Reporter: Brock Noland
> Assignee: Ferdinand Xu
>
> ParquetSerde could be much cleaner if we used SearchArgument and associated
> classes like ORC does:
> https://github.com/apache/hive/blob/trunk/serde/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)