[ 
https://issues.apache.org/jira/browse/HIVE-13756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414342#comment-15414342
 ] 

Mithun Radhakrishnan commented on HIVE-13756:
---------------------------------------------

+1. 

> Map failure attempts to delete reducer _temporary directory on multi-query 
> pig query
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-13756
>                 URL: https://issues.apache.org/jira/browse/HIVE-13756
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 1.2.1, 2.0.0
>            Reporter: Chris Drome
>            Assignee: Chris Drome
>         Attachments: HIVE-13756-branch-1.patch, HIVE-13756.1-branch-1.patch, 
> HIVE-13756.1.patch, HIVE-13756.patch
>
>
> A pig script, executed with multi-query enabled, that reads the source data 
> and writes it as-is into TABLE_A as well as performing a group-by operation 
> on the data which is written into TABLE_B can produce erroneous results if 
> any map fails. This results in a single MR job that writes the map output to 
> a scratch directory relative to TABLE_A and the reducer output to a scratch 
> directory relative to TABLE_B.
> If one or more maps fail it will delete the attempt data relative to TABLE_A, 
> but it also deletes the _temporary directory relative to TABLE_B. This has 
> the unintended side-effect of preventing subsequent maps from committing 
> their data. This means that any maps which successfully completed before the 
> first map failure will have its data committed as expected, other maps not, 
> resulting in an incomplete result set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to