Streaming: user produced stderr should be available while the job is still 
running, with no extra text inserted
---------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-2279
                 URL: https://issues.apache.org/jira/browse/HADOOP-2279
             Project: Hadoop
          Issue Type: Improvement
          Components: contrib/streaming
            Reporter: arkady borkovsky


This functionality should look like this:
   * when a streaming job is run, two additional DFS directories are created -- 
one for mapper stderr, another for reducer stderr.   (The names of the 
directories may be specified by the user, or may be defaulted to something 
derived from the output name -- e.g. is the output directory is XYZ, the stderr 
directories may me XYZ.log.map and XYZ.log.reduce)
   * for each task, a file is created in the corresponding stderr directory
   * the stderr produced by a (map or reduce) task shows up in the DFS as the 
task is running.  From user perspective, it should like the lines written by 
the streaming command are appended to the corresponding DFS file.


This may be useful outside streaming as well.  However, 
(a) in Java applications, there are other features a task may use to 
communicate with the Main
(b) the implementation of capturing stderr is different in Java MapReduce and 
in Streaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to