[
https://issues.apache.org/jira/browse/HIVE-18875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488047#comment-16488047
]
Deepak Jaiswal commented on HIVE-18875:
---------------------------------------
[~hagleitn] thanks for looking into the patch. your questions are in italics.
_The join operator should not have to check whether the parent is a group by -
that seems brittle. Can we always force the logic to close other branches to
flush out remaining records? If we introduce any other blocking operators the
same logic has to apply, right?_
I agree it is brittle. The logic was put there to establish it is on reducer
side so that we dont execute it otherwise. I think it will be simpler to put a
flag somewhere or use some other existing info to establish the same info.
_Not using the tag in the group by operator (hard code to 0) seems wrong, why
is that the correct thing to do?_
The tag is irrelevant in GBY. There is no other use case of tag other than SMB
as of now. There is always exactly one OI and SMB may send tag 1 or larger
which causes ArrayIndexOutOfBoundsExcpetion.
_Why are you turning sortmerge join conversion off explicitly in some test
files? Can you explain + add comment there?_
Most of those tests are already testing SMB. The way I structured those tests
is such that it first runs the query without SMB and then with SMB, however,
since it is now on by default it needs to be explicitly turned off for those
sections.
subquery_notin.q had typo in it which I fixed after discussing it with
[~vgarg]. The query which ran and explain before it were different.
_If I read this right, then you're new check in ConvertJoinMapjoin basically
makes sure that there is no projection in between gby and join that would alter
bucketing or sorting. That is exactly what op traits are for - why can't we use
that in this case?_
That would be ideal case. It looks like op traits in its current form are not
sufficient to handle all SMB cases. Maybe I can do it as a follow through?
> Enable SMB Join by default in Tez
> ---------------------------------
>
> Key: HIVE-18875
> URL: https://issues.apache.org/jira/browse/HIVE-18875
> Project: Hive
> Issue Type: Task
> Reporter: Deepak Jaiswal
> Assignee: Deepak Jaiswal
> Priority: Major
> Attachments: HIVE-18875.1.patch, HIVE-18875.2.patch,
> HIVE-18875.3.patch, HIVE-18875.4.patch, HIVE-18875.5.patch, HIVE-18875.6.patch
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)