[
https://issues.apache.org/jira/browse/HIVE-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17206145#comment-17206145
]
Stamatis Zampetakis commented on HIVE-23976:
--------------------------------------------
Hi [~abstractdog],
While working on HIVE-24221, I got some further questions/ideas regarding this
issue.
It seems that we make use of n-ary vectorized expressions for the evaluation of
AND and OR operators; its true it is not done with the descriptor but through
{{VectorizationContext}}. I am not sure what this mean in terms of efficiency,
but it looks like we are saving at least some memory since I get the impression
that we can reuse the output vector and not have a different output vector per
pair of binary operations. We could employ something similar for an n-ary hash
function.
Assuming that we cannot/should not treat the hash as n-ary operator then I
think it makes more sense to make it unary (single input, single output),
instead of binary, being only a kind of wrapper around Murmur for the different
datatypes. By doing this the implementation will be simpler and we can cover
more use-cases as the combine step is delegated to another abstraction.
+Currently+
{noformat}
hash(a,b) = 31*murmur(a) + murmur(b)
{noformat}
+After+
{noformat}
hash(a) = murmur(a)
{noformat}
What do you think?
> Enable vectorization for multi-col semi join reducers
> -----------------------------------------------------
>
> Key: HIVE-23976
> URL: https://issues.apache.org/jira/browse/HIVE-23976
> Project: Hive
> Issue Type: Improvement
> Reporter: Stamatis Zampetakis
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine.
> However, the implementation relies on GenericUDFMurmurHash which is not
> vectorized thus the respective operators cannot be executed in vectorized
> mode.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)