[ 
https://issues.apache.org/jira/browse/HADOOP-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653677#action_12653677
 ] 

Devaraj Das commented on HADOOP-4620:
-------------------------------------

I'd propose that for Reducers, we solve only the "hang" problem for the empty 
input case. I don't think it is required to support the collector for reduces 
with empty inputs. A reducer is supposed to aggregate/reduce data that the maps 
generate for it and if the maps didn't generate anything for a given reducer it 
seems okay that it should not generate any output. 
For the maps, the picture is a bit different with empty inputs and we should 
support it. The use case here is Hadoop enables parallelizing the native 
application (where the application could be reading its input off some source 
which Hadoop is not aware of), or, it just enables running multiple instances 
of the native application on a cluster. This use case might set the number of 
reduces to 0.
What do others think?

> Streaming mapper never completes if the mapper does not write to stdout
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4620
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4620
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2
>            Reporter: Runping Qi
>            Assignee: Ravi Gummadi
>         Attachments: solves_mapper_4620.patch
>
>
> A mapper of a streaming job has empty input data and thus it produces no 
> output.
> The task never completes.
> The following are the last two lines from the task log:
> 2008-11-07 21:59:48,254 INFO org.apache.hadoop.streaming.PipeMapRed: 
> PipeMapRed exec [/usr/bin/perl, xxx]
> 2008-11-07 21:59:48,330 INFO org.apache.hadoop.streaming.PipeMapRed: 
> mapRedFinished
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to