[
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Panagiotis Garefalakis updated HIVE-22731:
------------------------------------------
Issue Type: Improvement (was: Bug)
> Probe MapJoin hashtables for row level filtering
> ------------------------------------------------
>
> Key: HIVE-22731
> URL: https://issues.apache.org/jira/browse/HIVE-22731
> Project: Hive
> Issue Type: Improvement
> Components: Hive, llap
> Reporter: Panagiotis Garefalakis
> Assignee: Panagiotis Garefalakis
> Priority: Major
> Attachments: HIVE-22731.1.patch, HIVE-22731.WIP.patch,
> decode_time_bars.pdf
>
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level.
> They only filter sets of rows if they can guarantee that none of the rows can
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend decoding rows with
> multiple columns that are not even used in the final result. See figure where
> original is what happens today and in LazyDecode we skip decoding rows that
> do not match the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin
> we could utilize the key HashTable created from the smaller table to skip
> deserializing row columns at the larger table that do not match any key and
> thus save CPU time.
> This Jira investigates this direction.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)