With HADOOP-1216, the framework will support reduce=none feature by setting numReduceTasks=0.
If a map/reduce job set numReduceTasks=0, it will not create any reducer tasks. The mappers will not generate the map output files either. Rather, each mapper will generate one DFS file in the output dir specified for the job and save the output of the mapper to the file as a part of the final result. This behavior will be the same whether a job is streaming or non-streaming. I wonder whether this behavior serves all the need of the current stream job user community. If so, we can eliminate all the weird "features" currently hacked in streaming implementation, such as sending the output of mappers through a socket (i.e. useSingleSideOutputURI_ option). Thoughts? Runping