thank you very much for your time. and sorry for the trouble. if I understand this right, specifying 'b' in the start field spares me the fallback sort of the complete line. and this actually does the trick. I remain a little in the dark regarding the dictionary vs. byte (POSIX vs. C) ordering. I've tried both on asd2 (without the 'b') with the same result. but I trust you on this one.
Francesco P.S.: just got Gordon's reply. thank you for that. On Wed February 2 2011 17:44, Eric Blake wrote: > On 02/02/2011 05:42 AM, Francesco Bettella wrote: > > hi, > > I may have bumped into an undesired feature/bug of sort, which appears to be > > still present in the version 8.9 of coreutils. > > Thanks for the report. However, this is a feature, and not a bug, of sort. > > > > > I'm issuing the following sort commands (see attached files): > > > > [prompt1] > sort -k 1.4,1n asd1 > asd1.sorted > > > > [prompt2] > sort -k 2.4,2n asd2 > asd2.sorted > > If I'm correct, asd1 and asd2 have the same contents, except that you > have swapped columns 1 and 2 between the two and resorted the lines. > And your desired goal is that the output matches asd1.sorted, again with > the columns swapped for asd2.sorted. > > > > > the first one works as I would expect, the second one doesn't. > > Let's examine why: > > $ head -3 asd1 | sort -k 1.4,1n --debug > sort: using `en_US.UTF-8' sorting rules > sort: leading blanks are significant in key 1; consider also specifying `b' > chr>coding_gene > ^ no match for key > _______________ > chr1>PRAMEF1 > _ > ____________ > chr1>PRAMEF4 > _ > ____________ > $ head -3 asd1 | LC_ALL=C sort -k 1.4,1n --debug > sort: using simple byte comparison > sort: leading blanks are significant in key 1; consider also specifying `b' > chr>coding_gene > ^ no match for key > _______________ > chr1>PRAMEF1 > _ > ____________ > chr1>PRAMEF4 > _ > ____________ > > In both cases, when there is no match for a key but numeric sorting was > requested, then that line sorts first; meanwhile, you get the fallback > sort of the complete line after the first key has been sorted, so that > the end result matches asd1.sorted whether you use the C locale or > dictionary sorting. > > But notice that warning about not using -b, and how it affects asd2 (and > also, how the difference in dictionary vs. byte-ordering plays a role in > the secondary sorting): > > $ head -3 asd2 | sort -k 2.4,2n --debug > sort: using `en_US.UTF-8' sorting rules > sort: leading blanks are significant in key 1; consider also specifying `b' > coding_gene>chr > ^ no match for key > _______________ > PRAMEF1>chr1 > ^ no match for key > ____________ > PRAMEF4>chr1 > ^ no match for key > ____________ > $ head -3 asd2 | LC_ALL=C sort -k 2.4,2n --debug > sort: using simple byte comparison > sort: leading blanks are significant in key 1; consider also specifying `b' > PRAMEF1>chr1 > ^ no match for key > ____________ > PRAMEF4>chr1 > ^ no match for key > ____________ > coding_gene>chr > ^ no match for key > > But when you add -b (note, b is the one option you have to add to the > start field, since it affects start and end fields specially; all other > options can be added to start, end, or both, and affect the entire key): > > $ head -3 asd2 | sort -k 2.4b,2n --debug > sort: using `en_US.UTF-8' sorting rules > coding_gene>chr > ^ no match for key > _______________ > PRAMEF1>chr1 > _ > ____________ > PRAMEF4>chr1 > _ > ____________ > $ head -3 asd2 | LC_ALL=C coreutils/src/sort -k 2.4b,2n --debug > coreutils/src/sort: using simple byte comparison > coding_gene>chr > ^ no match for key > _______________ > PRAMEF1>chr1 > _ > ____________ > PRAMEF4>chr1 > _ > ____________ > > That is, your expectations were insufficient - without telling sort > enough additional information, sort correctly followed what you told it > to do, but what you told it was not what you meant. And the --debug > option is your [new] friend :) > > -- > Eric Blake [email protected] +1-801-349-2682 > Libvirt virtualization library http://libvirt.org > >
