[ 
https://issues.apache.org/jira/browse/PIG-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345910#comment-14345910
 ] 

Rohini Palaniswamy commented on PIG-4412:
-----------------------------------------

bq. How about making ExecutableManager.close synchronized?
  Based on the stackoverflow problem description, the problem that the user is 
saying is that ExecutableManager.close is not called in some cases.  Before 
this patch the POStream.finish() method was never called by any code. The 
ExecutableManager.close() was directly called, but only in ProcessInputThread. 
We need to find the exact case and add close() there. Adding it in all probable 
places is closing it prematurely in some cases and causing the NPE and hanging 
problems.

> Race condition in writing multiple outputs from STREAM op
> ---------------------------------------------------------
>
>                 Key: PIG-4412
>                 URL: https://issues.apache.org/jira/browse/PIG-4412
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.15.0
>
>         Attachments: PIG-4412.patch
>
>
> Basically copying the issue described here:
> http://stackoverflow.com/questions/28327044/pig-streaming-some-output-files-are-missing
> Roughly, I believe the issue is that there is a race condition in the code in 
> the HadoopExecutableManager that moves multiple output files from a script 
> into HDFS and the MapReduce task that is shutting down after it writes the 
> last bits from the "main" output of the STREAM task. Pig needs to make sure 
> that the ExecutableManager is closed (and thus the files are moved from the 
> local dir to HDFS) before it returns the end-of-stream tuple to signal that 
> the stream is finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to