[ 
https://issues.apache.org/jira/browse/HADOOP-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653288#action_12653288
 ] 

Ravi Gummadi commented on HADOOP-4620:
--------------------------------------

Adding the following new class in streaming and starting the output thread in 
its run() method solves the problem of mapper hanging(when 0 byte input & 
nonzero output).

+public class PipeMapRunner<K1, V1, K2, V2> extends MapRunner<K1, V1, K2, V2> {
+  public void run(RecordReader<K1, V1> input, OutputCollector<K2, V2> output,
+                  Reporter reporter)
+         throws IOException {
+    PipeMapper pipeMapper = (PipeMapper)getMapper();
+    pipeMapper.startOutputThreads(output, reporter);
+    super.run(input, output, reporter);
+  }
+}


Investigating if similar thing can be done in Reduce phase also(Reducer also 
hangs if input is of size 0 bytes and if it produces output).
Thoughts ?

> Streaming mapper never completes if the mapper does not write to stdout
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4620
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4620
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2
>            Reporter: Runping Qi
>            Assignee: Ravi Gummadi
>
> A mapper of a streaming job has empty input data and thus it produces no 
> output.
> The task never completes.
> The following are the last two lines from the task log:
> 2008-11-07 21:59:48,254 INFO org.apache.hadoop.streaming.PipeMapRed: 
> PipeMapRed exec [/usr/bin/perl, xxx]
> 2008-11-07 21:59:48,330 INFO org.apache.hadoop.streaming.PipeMapRed: 
> mapRedFinished
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to