[
https://issues.apache.org/jira/browse/HADOOP-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653677#action_12653677
]
Devaraj Das commented on HADOOP-4620:
-------------------------------------
I'd propose that for Reducers, we solve only the "hang" problem for the empty
input case. I don't think it is required to support the collector for reduces
with empty inputs. A reducer is supposed to aggregate/reduce data that the maps
generate for it and if the maps didn't generate anything for a given reducer it
seems okay that it should not generate any output.
For the maps, the picture is a bit different with empty inputs and we should
support it. The use case here is Hadoop enables parallelizing the native
application (where the application could be reading its input off some source
which Hadoop is not aware of), or, it just enables running multiple instances
of the native application on a cluster. This use case might set the number of
reduces to 0.
What do others think?
> Streaming mapper never completes if the mapper does not write to stdout
> -----------------------------------------------------------------------
>
> Key: HADOOP-4620
> URL: https://issues.apache.org/jira/browse/HADOOP-4620
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.2
> Reporter: Runping Qi
> Assignee: Ravi Gummadi
> Attachments: solves_mapper_4620.patch
>
>
> A mapper of a streaming job has empty input data and thus it produces no
> output.
> The task never completes.
> The following are the last two lines from the task log:
> 2008-11-07 21:59:48,254 INFO org.apache.hadoop.streaming.PipeMapRed:
> PipeMapRed exec [/usr/bin/perl, xxx]
> 2008-11-07 21:59:48,330 INFO org.apache.hadoop.streaming.PipeMapRed:
> mapRedFinished
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.