[ https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717546#action_12717546 ]
Devaraj Das commented on HADOOP-5979: ------------------------------------- I am not able to clearly see how this whole thing would fit in the MR model in the implementation we have in Hadoop. So the way it works is that the outputcollector thread in PipeMapper collects the key/vals from the streaming mapper and emits them to the framework. The framework part of the data-path, MapTask.MapOutputBuffer.collect, then invokes getPartition on the key/value and dumps it in the key/val buffer (which at a later point is sorted and spilled to disk). In the approach you outlined, the Partitioner would update the key/value. What would be collected by MapTask? We'd like to keep the original key/value intact, right? Where would the getPartition get called? Another approach for implementing this feature is that you have a special Java implementation of the partitioner that in its getPartition method writes to the command and reads back the partition number. This model will be similar to the PipeMapper/Reducer models. The main difference would be that the getPartition would be a blocking call (as opposed to the map or reduce where the write-to and read-from the process is asynchronous). Thoughts? > Streaming partitioner should allow command, not just Java class > --------------------------------------------------------------- > > Key: HADOOP-5979 > URL: https://issues.apache.org/jira/browse/HADOOP-5979 > Project: Hadoop Core > Issue Type: Improvement > Components: contrib/streaming > Reporter: Klaas Bosteels > > Since HADOOP-4842 got committed, Streaming allows both commands and Java > classes to be specified as mapper, reducer, and combiner, but the > {{-partitioner}} option is still limited to Java classes only. Allowing > commands to be specified as partitioner as well would greatly improve the > flexibility of Streaming programs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.