[ 
https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717591#action_12717591
 ] 

Klaas Bosteels commented on HADOOP-5979:
----------------------------------------

Yeah, I was actually suggesting such a special Java implementation that writes 
to and reads from a command, but instead of letting the command generate the 
partition number directly, I thought it might make sense to let it output a key 
or even a key/value pair (which are completely separate from the other 
MapReduce keys and values) and determine the partition from that. So instead of 
generating the same number for pairs that need to go to the same reducer, the 
partitioner command would just have to generate the same key for those pairs. 
The benefits of such an approach would be that
# it's simpler (the partitioner command doesn't need to know how many 
partitions there are),
# it might be easier to define a suitable partitioner command (when using shell 
tools it might be easier to output a string instead of a specific number for 
example),
# we could reuse more code that's already there (if we let the the partitioner 
command output both a key and a value and pass that on to a wrapped 
partitioner, like in the code sample I gave above, we even wouldn't need any 
additional reading/writing logic).

> Streaming partitioner should allow command, not just Java class
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5979
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5979
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Klaas Bosteels
>
> Since HADOOP-4842 got committed, Streaming allows both commands and Java 
> classes to be specified as mapper, reducer, and combiner, but the 
> {{-partitioner}} option is still limited to Java classes only. Allowing 
> commands to be specified as partitioner as well would greatly improve the 
> flexibility of Streaming programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to