[ 
https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716599#action_12716599
 ] 

Klaas Bosteels commented on HADOOP-5979:
----------------------------------------

I haven't thought much about the details yet, but the easiest way to implement 
it might be to add a {{PipePartitioner}} that extends {{PipeMapper}} yes, much 
like {{PipeCombiner}} is an extension of {{PipeReducer}}. The 
{{PipePartitioner}} would have to implement {{Partitioner}}, however, so it 
would also have to add an {{int getPartition(Object key, Object value, int 
numPartitions)}} method, which could work somewhat similarly to the {{void 
map(...)}} method. The way I see it, this method would use {{inWriter_}} to 
write the key and value to the standard input of the partitioner command and 
then rely on {{outReader_}} to read the key and value returned for this pair 
and supply them to the {{int getPartition(...)}} method of a wrapped 
partitioner, i.e., simplified it could look something like:

{code}
public int getPartition(K2 key, V2 value, int numPartitions) {
  if (!ignoreKey) {
    inWriter_.writeKey(key);
  }
  inWriter_.writeValue(value);
  if (!outReader_.readKeyValue()) {
    throw RuntimeException("partioner must output one key/val pair for each 
input pair");
  }
  Object newKey = outReader_.getCurrentKey();
  Object newValue = outReader_.getCurrentValue();
  return realPartitioner.getPartition(newKey, newValue, numPartitions);
}
{code}

Streaming users could then easily define partitioners by specifying a 
partitioner command that transforms key/value pairs in such a way that the 
wrapped partitioner shows the desired behavior. The default wrapped partitioner 
should probably be {{HashPartitioner}}. 

Does this make sense to you, Devaraj?

> Streaming partitioner should allow command, not just Java class
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5979
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5979
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Klaas Bosteels
>
> Since HADOOP-4842 got committed, Streaming allows both commands and Java 
> classes to be specified as mapper, reducer, and combiner, but the 
> {{-partitioner}} option is still limited to Java classes only. Allowing 
> commands to be specified as partitioner as well would greatly improve the 
> flexibility of Streaming programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to