tag 22109 notabug
close 22109
stop
Hello Ed,
On 12/07/2015 10:36 AM, Ed Brambley wrote:
The following problem came to light following a StackOverflow question [1]. The
lexical ordering of sort appears to depend on the delimiter used, and I believe
it shouldn't. As a minimal example:
### Correct ordering ###
$ printf "1,a,1\n2,aa,2" | LC_ALL=C sort -k2 -t,
1,a,1
2,aa,2
### Incorrect ordering by replacing the "," delimiter by "~" ###
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2 -t~
2~aa~2
1~a~1
This is not a bug in 'sort', but simply an incorrect usage of the key options.
The parameter "-k2" means: use the second key *and all characters until the end
of the line* to sort each line.
In this case, the character after the second key ',' or '~' does come into play.
The correct usage is to specify the key as "-k2,2" meaning: sort by the second
key alone (then resolve equal keys by the entire line, unless --stable is used).
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort -k2,2 -t~
1~a~1
2~aa~2
Using sort's "--debug" option will illustrate the difference (notice the
underscore characters indicating what is the key that is being used):
Incorrect usage (-k2):
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort --debug -k2 -t~
sort: using simple byte comparison
2~aa~2
____
______
1~a~1
___
_____
Better usage (-k2,2):
$ printf "1~a~1\n2~aa~2" | LC_ALL=C sort --debug -k2,2 -t~
sort: using simple byte comparison
1~a~1
_
_____
2~aa~2
__
______
regards,
- assaf