[
https://issues.apache.org/jira/browse/HIVE-24205?focusedWorklogId=496055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-496055
]
ASF GitHub Bot logged work on HIVE-24205:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Oct/20 17:42
Start Date: 06/Oct/20 17:42
Worklog Time Spent: 10m
Work Description: mustafaiman closed pull request #1549:
URL: https://github.com/apache/hive/pull/1549
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 496055)
Time Spent: 20m (was: 10m)
> Optimise CuckooSetBytes
> -----------------------
>
> Key: HIVE-24205
> URL: https://issues.apache.org/jira/browse/HIVE-24205
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Mustafa Iman
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2020-09-28 at 4.29.24 PM.png, bench.png,
> vectorized.patch
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> {{FilterStringColumnInList, StringColumnInList}} etc use CuckooSetBytes for
> lookup.
> !Screenshot 2020-09-28 at 4.29.24 PM.png|width=714,height=508!
> One option to optimize would be to add boundary conditions on "length" with
> the min/max length stored in the hashes (ref:
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/CuckooSetBytes.java#L85])
> . This would significantly reduce the number of hash computation that needs
> to happen. E.g
> [TPCH-Q12|https://github.com/hortonworks/hive-testbench/blob/hdp3/sample-queries-tpch/tpch_query12.sql#L20]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)