Chris Drome created HIVE-13756:
----------------------------------

             Summary: Map failure attempts to delete reducer _temporary 
directory on multi-query pig query
                 Key: HIVE-13756
                 URL: https://issues.apache.org/jira/browse/HIVE-13756
             Project: Hive
          Issue Type: Bug
          Components: HCatalog
    Affects Versions: 2.0.0, 1.2.1
            Reporter: Chris Drome
            Assignee: Chris Drome


A pig script, executed with multi-query enabled, that reads the source data and 
writes it as-is into TABLE_A as well as performing a group-by operation on the 
data which is written into TABLE_B can produce erroneous results if any map 
fails. This results in a single MR job that writes the map output to a scratch 
directory relative to TABLE_A and the reducer output to a scratch directory 
relative to TABLE_B.

If one or more maps fail it will delete the attempt data relative to TABLE_A, 
but it also deletes the _temporary directory relative to TABLE_B. This has the 
unintended side-effect of preventing subsequent maps from committing their 
data. This means that any maps which successfully completed before the first 
map failure will have its data committed as expected, other maps not, resulting 
in an incomplete result set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to