[
https://issues.apache.org/jira/browse/HIVE-22959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071127#comment-17071127
]
Panagiotis Garefalakis edited comment on HIVE-22959 at 3/30/20, 4:44 PM:
-------------------------------------------------------------------------
Hey [~omalley] – the idea here is to abstract the information needed (by
data-format consumers) to enable more fine-grained filtering (e.g., ORC-577)
You are right, VRB does contains similar information but the problem is not all
consumers make use of VRB — for example in Hive we are currently using Batches
of
[ColumnVectors]([https://github.com/apache/hive/blob/aa94b8d5cefc332c7269a0d8857a9778b9fe1b0c/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java])
instead.
The proposed MutableFilterContext also provides some optimizations like the
borrowSelected method to reuse the allocated selected array across filters and
exposes a immutable context by default to make it harder for API users to
modify the context values when they shouldn't.
was (Author: pgaref):
Hey [~omalley] – the idea here is to abstract the information needed (by
data-format consumers) to enable more fine-grained filtering (e.g., ORC-611)
You are right, VRB does contains similar information but the problem is not all
consumers make use of VRB — for example in Hive we are currently using Batches
of
[ColumnVectors]([https://github.com/apache/hive/blob/aa94b8d5cefc332c7269a0d8857a9778b9fe1b0c/llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java])
instead.
The proposed MutableFilterContext also provides some optimizations like the
borrowSelected method to reuse the allocated selected array across filters and
exposes a immutable context by default to make it harder for API users to
modify the context values when they shouldn't.
> Extend storage-api to expose FilterContext
> ------------------------------------------
>
> Key: HIVE-22959
> URL: https://issues.apache.org/jira/browse/HIVE-22959
> Project: Hive
> Issue Type: Sub-task
> Components: storage-api
> Reporter: Panagiotis Garefalakis
> Assignee: Panagiotis Garefalakis
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0, storage-2.7.2
>
> Attachments: HIVE-22959.1.patch, HIVE-22959.2.patch,
> HIVE-22959.3.patch, HIVE-22959.4.patch
>
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> To enable row-level filtering at the ORC level ORC-577, or as an extension
> ProDecode MapJoin HIVE-22731 we need a common context class that will hold
> all the needed information for the filter.
> I propose this class to be part of the storage-api – similar to
> VectorizedRowBatch class and hold the information below:
> * A boolean variable showing if the filter is enabled
> * A int array storing the row Ids that are actually selected (passing the
> filter)
> * An int variable storing the the number or rows that passed the filter
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)