[jira] [Work started] (HIVE-22731) Use MapJoin hashtables for row level filtering

Panagiotis Garefalakis (Jira) Wed, 15 Jan 2020 09:06:48 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-22731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Work on HIVE-22731 started by Panagiotis Garefalakis.
-----------------------------------------------------
> Use MapJoin hashtables for row level filtering
> ----------------------------------------------
>
>                 Key: HIVE-22731
>                 URL: https://issues.apache.org/jira/browse/HIVE-22731
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, llap
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>         Attachments: decode_time_bars.pdf
>
>
> Currently, RecordReaders such as ORC support filtering at coarser-grained 
> levels, namely: File, Stripe (64 to 256mb), and Row group (10k row) level. 
> They only filter sets of rows if they can guarantee that none of the rows can 
> pass a filter (usually given as searchable argument).
> However, a significant amount of time can be spend deconding rows with 
> multiple columns that are not even used in the final result. See figure where 
> original is what happens today and in LazyDecode we skip decoding rows that 
> do not much the key.
> To enable a more fine-grained filtering in the particular case of a MapJoin 
> we could utilize the key HashTable created from the smaller table to skip 
> deserializing row columns at the larger table that do not match any key and 
> thus save CPU time. 
> This Jira investigates this direction. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (HIVE-22731) Use MapJoin hashtables for row level filtering

Reply via email to