[
https://issues.apache.org/jira/browse/HIVE-2095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
He Yongqiang updated HIVE-2095:
-------------------------------
Attachment: HIVE-2095.2.patch
> auto convert map join bug
> -------------------------
>
> Key: HIVE-2095
> URL: https://issues.apache.org/jira/browse/HIVE-2095
> Project: Hive
> Issue Type: Bug
> Reporter: He Yongqiang
> Assignee: He Yongqiang
> Attachments: HIVE-2095.1.patch, HIVE-2095.2.patch
>
>
> 1)
> when considering to choose one table as the big table candidate for a map
> join, if at compile time, hive can find out that the total known size of all
> other tables excluding the big table in consideration is bigger than a
> configured value, this big table candidate is a bad one, and should not put
> into plan. Otherwise, at runtime to filter this out may cause more time.
> 2)
> added a null check for back up tasks. Otherwise will see NullPointerException
> 3)
> CommonJoinResolver needs to know a full mapping of pathToAliases. Otherwise
> it will make wrong decision.
> 4)
> changes made to the ConditionalResolverCommonJoin: added pathToAliases,
> aliasToSize (alias's input size that is known at compile time, by
> inputSummary), and intermediate dir path.
> So the logic is, go over all the pathToAliases, and for each path, if it is
> from intermediate dir path, add this path's size to all aliases. And finally
> based on the size information and others like aliasToTask to choose the big
> table.
> 5)
> Conditional task's children contains wrong options, which may cause join fail
> or incorrect results. Basically when getting all possible children for the
> conditional task, should use a whitelist of big tables. Only tables in this
> while list can be considered as a big table.
> Here is the logic:
> + * Get a list of big table candidates. Only the tables in the returned set
> can
> + * be used as big table in the join operation.
> + *
> + * The logic here is to scan the join condition array from left to right.
> If
> + * see a inner join and the bigTableCandidates is empty, add both side of
> this
> + * inner join to big table candidates. If see a left outer join, and the
> + * bigTableCandidates is empty, add the left side to it, and if the
> + * bigTableCandidates is not empty, do nothing (which means the
> + * bigTableCandidates is from left side). If see a right outer join, clear
> the
> + * bigTableCandidates, and add right side to the bigTableCandidates, it
> means
> + * the right side of a right outer join always win. If see a full outer
> join,
> + * return null immediately (no one can be the big table, can not do a
> + * mapjoin).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira