Real use scenario of streaming with Reduce=None

Runping Qi Fri, 20 Apr 2007 16:26:42 -0700

 

With HADOOP-1216, the framework will support reduce=none feature by setting
numReduceTasks=0.


If a map/reduce job set numReduceTasks=0, it will not create any reducer
tasks.  

The mappers will not generate the map output files either. 

Rather, each mapper will generate one DFS file in the output dir specified
for the job and save the output of the mapper to the file as a part of the
final result.

This behavior will be the same whether a job is streaming or non-streaming.

I wonder whether this behavior serves all the need of the current stream job
user community. 

If so, we can eliminate all the weird "features" currently hacked in
streaming implementation, such as sending the output of mappers through a
socket (i.e. useSingleSideOutputURI_ option).

 

Thoughts?

 

Runping

Real use scenario of streaming with Reduce=None

Reply via email to