Josh Wills created PIG-4412:
-------------------------------
Summary: Race condition in writing multiple outputs from STREAM op
Key: PIG-4412
URL: https://issues.apache.org/jira/browse/PIG-4412
Project: Pig
Issue Type: Bug
Components: impl
Reporter: Josh Wills
Basically copying the issue described here:
http://stackoverflow.com/questions/28327044/pig-streaming-some-output-files-are-missing
Roughly, I believe the issue is that there is a race condition in the code in
the HadoopExecutableManager that moves multiple output files from a script into
HDFS and the MapReduce task that is shutting down after it writes the last bits
from the "main" output of the STREAM task. Pig needs to make sure that the
ExecutableManager is closed (and thus the files are moved from the local dir to
HDFS) before it returns the end-of-stream tuple to signal that the stream is
finished.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)