[ 
https://issues.apache.org/jira/browse/HIVE-20501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605175#comment-16605175
 ] 

Gopal V commented on HIVE-20501:
--------------------------------

This is trying to eliminate the hash computation and bucket probe lookup 
entirely for HashSet and HashMultiSet cases and return directly from the 
min-max result.

isSimpleRange returns true if the key-range is entirely continuous, by checking 
for total # of keys and the min-max longs. 

if the min=1, max=10 and there are 10 keys assigned, with no cases where 
newKey=false, then the assumption can be made that the hashset contains [1,10] 
and therefore for any value between 1-10, there's no further lookups necessary 
to return a result.

However, the inner loop JIT profiles of this tells me that I need to move the 
branches up into VectorMapJoinLeftSemiLongOperator and  
VectorMapJoinInnerBigOnlyLongOperator.

> Vectorization: Closed range fast-path for Fast Long hashset 
> ------------------------------------------------------------
>
>                 Key: HIVE-20501
>                 URL: https://issues.apache.org/jira/browse/HIVE-20501
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Major
>         Attachments: HIVE-20501.1.patch
>
>
> In scenarios where the surrogate keys are entirely contiguous, the cache can 
> offer a fast-path for [min,max], without a further lookup in the hashtable.
> {code}
> hive> select min(c_customer_sk), max(c_customer_sk), max(c_customer_sk) - 
> min(c_customer_sk), count(1) from customer;
> 1       65000000        64999999        65000000
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to