Josh Wills created PIG-4412:
-------------------------------

             Summary: Race condition in writing multiple outputs from STREAM op
                 Key: PIG-4412
                 URL: https://issues.apache.org/jira/browse/PIG-4412
             Project: Pig
          Issue Type: Bug
          Components: impl
            Reporter: Josh Wills


Basically copying the issue described here:

http://stackoverflow.com/questions/28327044/pig-streaming-some-output-files-are-missing

Roughly, I believe the issue is that there is a race condition in the code in 
the HadoopExecutableManager that moves multiple output files from a script into 
HDFS and the MapReduce task that is shutting down after it writes the last bits 
from the "main" output of the STREAM task. Pig needs to make sure that the 
ExecutableManager is closed (and thus the files are moved from the local dir to 
HDFS) before it returns the end-of-stream tuple to signal that the stream is 
finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to