Is this a bug in 'sort'?
I want to sort numerically the second column in a tab-delimited
file. However, the second column begins with text ("fig") followed
by numbers. So, what I want is "fig2" followed by "fig17", since 2
is less than 17. Here's a sample transcript of using sort on a one-
column and a two-column sample of the data:
# baseline with one-column data
[1] $ echo -e 'fig17\nfig2'
fig17
fig2
# text sort
[2] $ echo -e 'fig17\nfig2' | sort
fig17
fig2
# numerical sort -- works fine using the -k option and specifying the
character offset
[3] $ echo -e 'fig17\nfig2' | sort -k 1.4n
fig2
fig17
# baseline with two-column data
[4] $ echo -e 'x\tfig17\nx\tfig2'
x fig17
x fig2
# numerical sort -- identical to one-column sort except the key has
been incremented by one
# 'sort' does not sort numerically as expected
[5] $ echo -e 'x\tfig17\nx\tfig2' | sort -k 2.4n
x fig17
x fig2
# increase the character count by one -- numerical sort works
[6] $ echo -e 'x\tfig17\nx\tfig2' | sort -k 2.5n
x fig2
x fig17
# explicitly specify the field delimeter -- numerical sort works
[7] $ echo -e 'x\tfig17\nx\tfig2' | sort -t$'\t' -k 2.4n
x fig2
x fig17
So, to get 'sort' to work in multiple-column data, I had to use one
of two workarounds: 1) adjust the character offset or 2) explicitly
specify the delimiter. Would this be considered a bug? Or am I
overlooking some setting? Can anyone else replicate my results?
Not sure of the version of sort, but it is bundled with
coreutils-5.2.1-48.1 on FC4.
BTW, step 5 works as expected on OS X 10.4.6.
Regards,
- Robert
http://www.cwelug.org/downloads
Help others get OpenSource software. Distribute FLOSS
for Windows, Linux, *BSD, and MacOS X with BitTorrent
_______________________________________________
CWE-LUG mailing list
[email protected]
http://www.cwelug.org/
http://www.cwelug.org/archives/
http://www.cwelug.org/mailinglist/