Kevin Wilfong created HIVE-3496:
-----------------------------------
Summary: Query plan for multi-join where the third table joined is
a subquery containing a map-only union with hive.auto.convert.join=true is wrong
Key: HIVE-3496
URL: https://issues.apache.org/jira/browse/HIVE-3496
Project: Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
Take the following query as an example:
EXPLAIN SELECT * FROM
src11 a JOIN
src12 b ON (a.key = b.key) JOIN
(SELECT * FROM (SELECT * FROM src13 UNION ALL SELECT * FROM src14)a )c ON
c.value = b.value;
When hive.auto.convert.join=true, the two joins are implemented separately as
conditional tasks with two mapjoins and a backup common join. In the second
join, the conditional task will be a backup task, contained in the
ConditionalTask, and a root task. This is clearly wrong, and leads to query
failures.
I've traced this to the joinUnionPlan method of GenMapRedUtils. If the union
operator was performed in its own map reduce task and it could be a root task,
when it is added to the mapper of the existing task which performs the join in
the reducer, this task will get made a root task without first checking if the
existing (non-union) task has any dependencies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira