[jira] Issue Comment Edited: (HADOOP-4620) Streaming mapper never completes if the mapper does not write to stdout

Ruyue Ma (JIRA) Thu, 04 Dec 2008 22:03:10 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653642#action_12653642
 ]


mry.maillist edited comment on HADOOP-4620 at 12/4/08 10:02 PM:
------------------------------------------------------------

if mapper has no data to handle. We could not start native process. This will 
improve performance and avoid this problem.

so i suggest that it should start the native process in PiperMap->map().

we can move PipeMapRed part code to map function

      // Start the process
      ProcessBuilder builder = new ProcessBuilder(argvSplit);
      builder.environment().putAll(childEnv.toMap());
      sim = builder.start();

      clientOut_ = new DataOutputStream(new BufferedOutputStream(
                                              sim.getOutputStream(),
                                              BUFFER_SIZE));
      clientIn_ = new DataInputStream(new BufferedInputStream(
                                              sim.getInputStream(),
                                              BUFFER_SIZE));
      clientErr_ = new DataInputStream(new 
BufferedInputStream(sim.getErrorStream()));
      startTime_ = System.currentTimeMillis();

      errThread_ = new MRErrorThread();
      errThread_.start();



      was (Author: mry.maillist):
    if mapper has no data to handle. We can doesn't start native process. This 
will improve performance and avoid this problem.

so i suggest that it should start the native process in PiperMap->map().

we can move PipeMapRed part code to map function

      // Start the process
      ProcessBuilder builder = new ProcessBuilder(argvSplit);
      builder.environment().putAll(childEnv.toMap());
      sim = builder.start();

      clientOut_ = new DataOutputStream(new BufferedOutputStream(
                                              sim.getOutputStream(),
                                              BUFFER_SIZE));
      clientIn_ = new DataInputStream(new BufferedInputStream(
                                              sim.getInputStream(),
                                              BUFFER_SIZE));
      clientErr_ = new DataInputStream(new 
BufferedInputStream(sim.getErrorStream()));
      startTime_ = System.currentTimeMillis();

      errThread_ = new MRErrorThread();
      errThread_.start();


  
> Streaming mapper never completes if the mapper does not write to stdout
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4620
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4620
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2
>            Reporter: Runping Qi
>            Assignee: Ravi Gummadi
>
> A mapper of a streaming job has empty input data and thus it produces no 
> output.
> The task never completes.
> The following are the last two lines from the task log:
> 2008-11-07 21:59:48,254 INFO org.apache.hadoop.streaming.PipeMapRed: 
> PipeMapRed exec [/usr/bin/perl, xxx]
> 2008-11-07 21:59:48,330 INFO org.apache.hadoop.streaming.PipeMapRed: 
> mapRedFinished
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-4620) Streaming mapper never completes if the mapper does not write to stdout

Reply via email to