[ https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716599#action_12716599 ]
Klaas Bosteels commented on HADOOP-5979: ---------------------------------------- I haven't thought much about the details yet, but the easiest way to implement it might be to add a {{PipePartitioner}} that extends {{PipeMapper}} yes, much like {{PipeCombiner}} is an extension of {{PipeReducer}}. The {{PipePartitioner}} would have to implement {{Partitioner}}, however, so it would also have to add an {{int getPartition(Object key, Object value, int numPartitions)}} method, which could work somewhat similarly to the {{void map(...)}} method. The way I see it, this method would use {{inWriter_}} to write the key and value to the standard input of the partitioner command and then rely on {{outReader_}} to read the key and value returned for this pair and supply them to the {{int getPartition(...)}} method of a wrapped partitioner, i.e., simplified it could look something like: {code} public int getPartition(K2 key, V2 value, int numPartitions) { if (!ignoreKey) { inWriter_.writeKey(key); } inWriter_.writeValue(value); if (!outReader_.readKeyValue()) { throw RuntimeException("partioner must output one key/val pair for each input pair"); } Object newKey = outReader_.getCurrentKey(); Object newValue = outReader_.getCurrentValue(); return realPartitioner.getPartition(newKey, newValue, numPartitions); } {code} Streaming users could then easily define partitioners by specifying a partitioner command that transforms key/value pairs in such a way that the wrapped partitioner shows the desired behavior. The default wrapped partitioner should probably be {{HashPartitioner}}. Does this make sense to you, Devaraj? > Streaming partitioner should allow command, not just Java class > --------------------------------------------------------------- > > Key: HADOOP-5979 > URL: https://issues.apache.org/jira/browse/HADOOP-5979 > Project: Hadoop Core > Issue Type: Improvement > Components: contrib/streaming > Reporter: Klaas Bosteels > > Since HADOOP-4842 got committed, Streaming allows both commands and Java > classes to be specified as mapper, reducer, and combiner, but the > {{-partitioner}} option is still limited to Java classes only. Allowing > commands to be specified as partitioner as well would greatly improve the > flexibility of Streaming programs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.