Streaming: user produced stderr should be available while the job is still
running, with no extra text inserted
---------------------------------------------------------------------------------------------------------------
Key: HADOOP-2279
URL: https://issues.apache.org/jira/browse/HADOOP-2279
Project: Hadoop
Issue Type: Improvement
Components: contrib/streaming
Reporter: arkady borkovsky
This functionality should look like this:
* when a streaming job is run, two additional DFS directories are created --
one for mapper stderr, another for reducer stderr. (The names of the
directories may be specified by the user, or may be defaulted to something
derived from the output name -- e.g. is the output directory is XYZ, the stderr
directories may me XYZ.log.map and XYZ.log.reduce)
* for each task, a file is created in the corresponding stderr directory
* the stderr produced by a (map or reduce) task shows up in the DFS as the
task is running. From user perspective, it should like the lines written by
the streaming command are appended to the corresponding DFS file.
This may be useful outside streaming as well. However,
(a) in Java applications, there are other features a task may use to
communicate with the Main
(b) the implementation of capturing stderr is different in Java MapReduce and
in Streaming.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.