[
https://issues.apache.org/jira/browse/HIVE-20501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605175#comment-16605175
]
Gopal V commented on HIVE-20501:
--------------------------------
This is trying to eliminate the hash computation and bucket probe lookup
entirely for HashSet and HashMultiSet cases and return directly from the
min-max result.
isSimpleRange returns true if the key-range is entirely continuous, by checking
for total # of keys and the min-max longs.
if the min=1, max=10 and there are 10 keys assigned, with no cases where
newKey=false, then the assumption can be made that the hashset contains [1,10]
and therefore for any value between 1-10, there's no further lookups necessary
to return a result.
However, the inner loop JIT profiles of this tells me that I need to move the
branches up into VectorMapJoinLeftSemiLongOperator and
VectorMapJoinInnerBigOnlyLongOperator.
> Vectorization: Closed range fast-path for Fast Long hashset
> ------------------------------------------------------------
>
> Key: HIVE-20501
> URL: https://issues.apache.org/jira/browse/HIVE-20501
> Project: Hive
> Issue Type: Improvement
> Reporter: Gopal V
> Assignee: Gopal V
> Priority: Major
> Attachments: HIVE-20501.1.patch
>
>
> In scenarios where the surrogate keys are entirely contiguous, the cache can
> offer a fast-path for [min,max], without a further lookup in the hashtable.
> {code}
> hive> select min(c_customer_sk), max(c_customer_sk), max(c_customer_sk) -
> min(c_customer_sk), count(1) from customer;
> 1 65000000 64999999 65000000
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)