[ https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717591#action_12717591 ]
Klaas Bosteels commented on HADOOP-5979: ---------------------------------------- Yeah, I was actually suggesting such a special Java implementation that writes to and reads from a command, but instead of letting the command generate the partition number directly, I thought it might make sense to let it output a key or even a key/value pair (which are completely separate from the other MapReduce keys and values) and determine the partition from that. So instead of generating the same number for pairs that need to go to the same reducer, the partitioner command would just have to generate the same key for those pairs. The benefits of such an approach would be that # it's simpler (the partitioner command doesn't need to know how many partitions there are), # it might be easier to define a suitable partitioner command (when using shell tools it might be easier to output a string instead of a specific number for example), # we could reuse more code that's already there (if we let the the partitioner command output both a key and a value and pass that on to a wrapped partitioner, like in the code sample I gave above, we even wouldn't need any additional reading/writing logic). > Streaming partitioner should allow command, not just Java class > --------------------------------------------------------------- > > Key: HADOOP-5979 > URL: https://issues.apache.org/jira/browse/HADOOP-5979 > Project: Hadoop Core > Issue Type: Improvement > Components: contrib/streaming > Reporter: Klaas Bosteels > > Since HADOOP-4842 got committed, Streaming allows both commands and Java > classes to be specified as mapper, reducer, and combiner, but the > {{-partitioner}} option is still limited to Java classes only. Allowing > commands to be specified as partitioner as well would greatly improve the > flexibility of Streaming programs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.