[
https://issues.apache.org/jira/browse/HIVE-20504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606919#comment-16606919
]
Zoltan Haindrich commented on HIVE-20504:
-----------------------------------------
[~gopalv] this is not just about bmj; consider the following case:
* 2 tables with roughly the same data size - both fits into memory
* estimated buckets > 1 (enables that logic)
* numLlap nodes came out >=3
* dphj is selected on the basis of network cost
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L242
I've made some small measurements for this...and it looked like mj finished
faster...but my measurement could have been done on a too small dataset...
I'll repeat it with a bigger set.
> Give simple MJ bigger priority than bucketized ones
> ---------------------------------------------------
>
> Key: HIVE-20504
> URL: https://issues.apache.org/jira/browse/HIVE-20504
> Project: Hive
> Issue Type: Improvement
> Components: Statistics
> Reporter: Zoltan Haindrich
> Assignee: Zoltan Haindrich
> Priority: Major
> Attachments: HIVE-20504.01.patch, HIVE-20504.01.patch,
> HIVE-20504.01wip01.patch, HIVE-20504.01wip01.patch
>
>
> from the code it seems "standard" mapjoin is one of the last one tried; in
> case the table estimated to be bucketed in to 2 - but it's small ; Hive willl
> do a bucketmapjoin or dphj...even thru a simple mapjoin could have been an
> alternative...
> https://github.com/apache/hive/blob/154ca3e3b5eb78cd49a4b3650c750ca731fba7da/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java#L157
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)