[ 
https://issues.apache.org/jira/browse/HADOOP-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550258
 ] 

arkady borkovsky commented on HADOOP-2302:
------------------------------------------

This issue should probably be generalized to
"Streaming should provide more powerful specification primary keys comparator" 
(meaning the comparator used for splits).

It should allow at least
-- numeric comparison
-- reverse order comparison
-- multiple field comparison

One possible way to specify the comparison in the streaming command line is to 
use the familiar syntax of Unix sort command, like
    "-k2,2rn  -k1,1"
for "compare the second field, numerically, large value first; if equal, 
compare the first field, alphabetically"

Note that this specification implicitly defines the part of the string that is 
the key for shuffling purposes

>  Streaming should provide an option for numerical sort of keys
> --------------------------------------------------------------
>
>                 Key: HADOOP-2302
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2302
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: lohit vijayarenu
>
> It would be good to have an option for numerical sort of keys for streaming. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to