[
https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716599#action_12716599
]
Klaas Bosteels commented on HADOOP-5979:
----------------------------------------
I haven't thought much about the details yet, but the easiest way to implement
it might be to add a {{PipePartitioner}} that extends {{PipeMapper}} yes, much
like {{PipeCombiner}} is an extension of {{PipeReducer}}. The
{{PipePartitioner}} would have to implement {{Partitioner}}, however, so it
would also have to add an {{int getPartition(Object key, Object value, int
numPartitions)}} method, which could work somewhat similarly to the {{void
map(...)}} method. The way I see it, this method would use {{inWriter_}} to
write the key and value to the standard input of the partitioner command and
then rely on {{outReader_}} to read the key and value returned for this pair
and supply them to the {{int getPartition(...)}} method of a wrapped
partitioner, i.e., simplified it could look something like:
{code}
public int getPartition(K2 key, V2 value, int numPartitions) {
if (!ignoreKey) {
inWriter_.writeKey(key);
}
inWriter_.writeValue(value);
if (!outReader_.readKeyValue()) {
throw RuntimeException("partioner must output one key/val pair for each
input pair");
}
Object newKey = outReader_.getCurrentKey();
Object newValue = outReader_.getCurrentValue();
return realPartitioner.getPartition(newKey, newValue, numPartitions);
}
{code}
Streaming users could then easily define partitioners by specifying a
partitioner command that transforms key/value pairs in such a way that the
wrapped partitioner shows the desired behavior. The default wrapped partitioner
should probably be {{HashPartitioner}}.
Does this make sense to you, Devaraj?
> Streaming partitioner should allow command, not just Java class
> ---------------------------------------------------------------
>
> Key: HADOOP-5979
> URL: https://issues.apache.org/jira/browse/HADOOP-5979
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: Klaas Bosteels
>
> Since HADOOP-4842 got committed, Streaming allows both commands and Java
> classes to be specified as mapper, reducer, and combiner, but the
> {{-partitioner}} option is still limited to Java classes only. Allowing
> commands to be specified as partitioner as well would greatly improve the
> flexibility of Streaming programs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.