[ 
https://issues.apache.org/jira/browse/HADOOP-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HADOOP-2302:
--------------------------------

    Attachment: 2302.1.patch

A reasonably well tested patch. The following is done:
1) Options supported are -n (numeric comparison) and -r (reverse the result of 
comparison). So for e.g., one could say "-k1.1,1.2 -k2.1,2.3n -k2.4,3.1nr" as 
the value of mapred.text.key.comparator.job option (that the comparator 
understands).
2) Some refactoring is done - I needed access to the findBytes method defined 
in Streaming.UTF8ByteArrayUtils. But since this comparator implementation need 
not be dependent on the Streaming package, I made a new class 
org.apache.hadoop.util.UTF8ByteArrayUtils and filled that up with the "real" 
bytearray util methods. A few Streaming specific methods like findTab also 
exists in the Streaming.UTF8ByteArrayUtils. I moved them to a new class called 
Streaming.StreamKeyValUtil. All in all, i introduced two new classes and 
deprecated Streaming.UTF8ByteArrayUtils.
3) There is a partitioner function defined that would take a hash of just the 
portions of the keys that the user is interested in (using the same spec as the 
one defined for the comparator).

A note - the numCompare method in the comparator may be slightly verbose in 
terms of the code but that should help readability of the code.

>  Streaming should provide an option for numerical sort of keys
> --------------------------------------------------------------
>
>                 Key: HADOOP-2302
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2302
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Lohit Vijayarenu
>            Assignee: Devaraj Das
>             Fix For: 0.19.0
>
>         Attachments: 2302.1.patch
>
>
> It would be good to have an option for numerical sort of keys for streaming. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to