Namit Jain created HIVE-3477:
--------------------------------
Summary: Duplicate data possible with speculative execution for
dynamic partitions
Key: HIVE-3477
URL: https://issues.apache.org/jira/browse/HIVE-3477
Project: Hive
Issue Type: Bug
Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
Consider a query like:
insert overwrite T partition (ds)
select * from
(mapreduce-subq1
union all
mapreduce-subq2)x;
Once, mapreduce-subq1 and mapreduce-subq2 are done, the task for the union
is invoked. At the end of the union task, jobClose is invoked.
Note that there are 2 tablescan operators. The tree is something like:
TABLESCAN1 --
\
UNION -- SELECT -- FILESINK
/
TABLESCAN2 --
In the current setup, jobClose will be invoked twice for FileSink.
In case of speculative execution, it is possible that data is still is
being written to tmp Dir. after jobClose is finished once.
The correct fix would be to make sure that jobClose is only invoked once.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira