[ 
https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717546#action_12717546
 ] 

Devaraj Das commented on HADOOP-5979:
-------------------------------------

I am not able to clearly see how this whole thing would fit in the MR model in 
the implementation we have in Hadoop. So the way it works is that the 
outputcollector thread in PipeMapper collects the key/vals from the streaming 
mapper and emits them to the framework. The framework part of the data-path, 
MapTask.MapOutputBuffer.collect, then invokes getPartition on the key/value and 
dumps it in the key/val buffer (which at a later point is sorted and spilled to 
disk).
In the approach you outlined, the Partitioner would update the key/value. What 
would be collected by MapTask? We'd like to keep the original key/value intact, 
right? Where would the getPartition get called?
Another approach for implementing this feature is that you have a special Java 
implementation of the partitioner that in its getPartition method writes to the 
command and reads back the partition number. This model will be similar to the 
PipeMapper/Reducer models. The main difference would be that the getPartition 
would be a blocking call (as opposed to the map or reduce where the write-to 
and read-from the process is asynchronous). 
Thoughts?

> Streaming partitioner should allow command, not just Java class
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5979
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5979
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Klaas Bosteels
>
> Since HADOOP-4842 got committed, Streaming allows both commands and Java 
> classes to be specified as mapper, reducer, and combiner, but the 
> {{-partitioner}} option is still limited to Java classes only. Allowing 
> commands to be specified as partitioner as well would greatly improve the 
> flexibility of Streaming programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to