[
https://issues.apache.org/jira/browse/HADOOP-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607721#action_12607721
]
Devaraj Das commented on HADOOP-2302:
-------------------------------------
Would supporting the basic unix/gnu sort options in the comparator work:
-f, (Ignore-case)
-n, (Sort numerically)
-r, (Reverse the result of comparison)
-k _pos1[,pos2]_, where pos is of the form _f[.c][opts]_, where _f_ is the
number of the field to use, and _c_ is the number of the first character from
the beginning of the field. Fields and character positions are numbered
starting with 1; a character position of zero in pos2 indicates the field's
last character. If '.c' is omitted from pos1, it defaults to 1 (the beginning
of the field); if omitted from pos2, it defaults to 0 (the end of the field).
opts are ordering options (any of _fnr_ as described above).
We assume that the fields in the key are separated by
map.output.key.field.separator (already exists).
Do we need anything else?
Also, this could be done in a Java comparator implementation that the user
specifies to the framework via mapred.output.key.comparator.class. This
comparator would be used by both sort and merge.
> Streaming should provide an option for numerical sort of keys
> --------------------------------------------------------------
>
> Key: HADOOP-2302
> URL: https://issues.apache.org/jira/browse/HADOOP-2302
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: Lohit Vijayarenu
>
> It would be good to have an option for numerical sort of keys for streaming.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.