[
https://issues.apache.org/jira/browse/HADOOP-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Devaraj Das updated HADOOP-2302:
--------------------------------
Attachment: 2302.1.patch
A reasonably well tested patch. The following is done:
1) Options supported are -n (numeric comparison) and -r (reverse the result of
comparison). So for e.g., one could say "-k1.1,1.2 -k2.1,2.3n -k2.4,3.1nr" as
the value of mapred.text.key.comparator.job option (that the comparator
understands).
2) Some refactoring is done - I needed access to the findBytes method defined
in Streaming.UTF8ByteArrayUtils. But since this comparator implementation need
not be dependent on the Streaming package, I made a new class
org.apache.hadoop.util.UTF8ByteArrayUtils and filled that up with the "real"
bytearray util methods. A few Streaming specific methods like findTab also
exists in the Streaming.UTF8ByteArrayUtils. I moved them to a new class called
Streaming.StreamKeyValUtil. All in all, i introduced two new classes and
deprecated Streaming.UTF8ByteArrayUtils.
3) There is a partitioner function defined that would take a hash of just the
portions of the keys that the user is interested in (using the same spec as the
one defined for the comparator).
A note - the numCompare method in the comparator may be slightly verbose in
terms of the code but that should help readability of the code.
> Streaming should provide an option for numerical sort of keys
> --------------------------------------------------------------
>
> Key: HADOOP-2302
> URL: https://issues.apache.org/jira/browse/HADOOP-2302
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: Lohit Vijayarenu
> Assignee: Devaraj Das
> Fix For: 0.19.0
>
> Attachments: 2302.1.patch
>
>
> It would be good to have an option for numerical sort of keys for streaming.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.