[
https://issues.apache.org/jira/browse/HADOOP-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653288#action_12653288
]
Ravi Gummadi commented on HADOOP-4620:
--------------------------------------
Adding the following new class in streaming and starting the output thread in
its run() method solves the problem of mapper hanging(when 0 byte input &
nonzero output).
+public class PipeMapRunner<K1, V1, K2, V2> extends MapRunner<K1, V1, K2, V2> {
+ public void run(RecordReader<K1, V1> input, OutputCollector<K2, V2> output,
+ Reporter reporter)
+ throws IOException {
+ PipeMapper pipeMapper = (PipeMapper)getMapper();
+ pipeMapper.startOutputThreads(output, reporter);
+ super.run(input, output, reporter);
+ }
+}
Investigating if similar thing can be done in Reduce phase also(Reducer also
hangs if input is of size 0 bytes and if it produces output).
Thoughts ?
> Streaming mapper never completes if the mapper does not write to stdout
> -----------------------------------------------------------------------
>
> Key: HADOOP-4620
> URL: https://issues.apache.org/jira/browse/HADOOP-4620
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.2
> Reporter: Runping Qi
> Assignee: Ravi Gummadi
>
> A mapper of a streaming job has empty input data and thus it produces no
> output.
> The task never completes.
> The following are the last two lines from the task log:
> 2008-11-07 21:59:48,254 INFO org.apache.hadoop.streaming.PipeMapRed:
> PipeMapRed exec [/usr/bin/perl, xxx]
> 2008-11-07 21:59:48,330 INFO org.apache.hadoop.streaming.PipeMapRed:
> mapRedFinished
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.