Chris Drome created HIVE-13756:
----------------------------------
Summary: Map failure attempts to delete reducer _temporary
directory on multi-query pig query
Key: HIVE-13756
URL: https://issues.apache.org/jira/browse/HIVE-13756
Project: Hive
Issue Type: Bug
Components: HCatalog
Affects Versions: 2.0.0, 1.2.1
Reporter: Chris Drome
Assignee: Chris Drome
A pig script, executed with multi-query enabled, that reads the source data and
writes it as-is into TABLE_A as well as performing a group-by operation on the
data which is written into TABLE_B can produce erroneous results if any map
fails. This results in a single MR job that writes the map output to a scratch
directory relative to TABLE_A and the reducer output to a scratch directory
relative to TABLE_B.
If one or more maps fail it will delete the attempt data relative to TABLE_A,
but it also deletes the _temporary directory relative to TABLE_B. This has the
unintended side-effect of preventing subsequent maps from committing their
data. This means that any maps which successfully completed before the first
map failure will have its data committed as expected, other maps not, resulting
in an incomplete result set.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)