On 28 June 2010 18:07, Eric Blake <[email protected]> wrote: > On 06/28/2010 08:26 AM, Victor Grishchenko wrote: > Thanks for the report. However, I don't think this is a bug in sort, > but rather a misunderstanding on your part. Your command says to use as > your primary key the substring consisting of fields 17 through 30, and > as secondary key the entire line.
My fault. Probably, it makes sense to reference the POS format explanation from the -k option description. > What did you intend to sort by? If you were typing 17,30 thinking you > were getting bytes instead of fields, thus meaning: >> 0_01_19_377_086 vtt1_100 vtt2_9#8 Tdata (0,8132) > ................^^^^^^^^^^^^^^.................. Well, that would be closer to the intended result. As I see now, I need --key=2 --stable, i.e. from the 2nd field till the end, stable. By the way, regarding the LC_ALL warning at the man page. Me and my colleague have "independently discovered", that non-C locales might penalize sort performance by an order of magnitude. Probably, it makes sense to add that to the warning. $ time ( gzcat vtt2_98.gz | LC_ALL=ru_RU.UTF-8 sort > /dev/null ) real 1m52.153s user 1m41.614s sys 0m1.395s $ time ( gzcat vtt2_98.gz | LC_ALL=C sort > /dev/null ) real 0m10.096s user 0m4.255s sys 0m1.186s > Also, the next version of coreutils will include 'sort --debug' that > gives you a visual indication of what bytes are actually being compared, > which would have given you a clue that your --key=17,30 was selecting > data outside the range of your input. That is really good, because the absence of any error reports contributed to the confusion. -- Victor
