[ 
https://issues.apache.org/jira/browse/HIVE-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206145#comment-17206145
 ] 

Stamatis Zampetakis commented on HIVE-23976:
--------------------------------------------

Hi [~abstractdog],

While working on HIVE-24221, I got some further questions/ideas regarding this 
issue.

It seems that we make use of n-ary vectorized expressions for the evaluation of 
AND and OR operators; its true it is not done with the descriptor but through 
{{VectorizationContext}}. I am not sure what this mean in terms of efficiency, 
but it looks like we are saving at least some memory since I get the impression 
that we can reuse the output vector and not have a different output vector per 
pair of binary operations. We could employ something similar for an n-ary hash 
function.

Assuming that we cannot/should not treat the hash as n-ary operator then I 
think it makes more sense to make it unary (single input, single output), 
instead of binary, being only a kind of wrapper around Murmur for the different 
datatypes. By doing this the implementation will be simpler and we can cover 
more use-cases as the combine step is delegated to another abstraction.

+Currently+ 
{noformat}
hash(a,b) = 31*murmur(a) + murmur(b)
{noformat}

+After+
{noformat}
hash(a) = murmur(a)
{noformat}

What do you think?

> Enable vectorization for multi-col semi join reducers
> -----------------------------------------------------
>
>                 Key: HIVE-23976
>                 URL: https://issues.apache.org/jira/browse/HIVE-23976
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Stamatis Zampetakis
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. 
> However, the implementation relies on GenericUDFMurmurHash which is not 
> vectorized thus the respective operators cannot be executed in vectorized 
> mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to