Consider this three-line source file, say /tmp/foo:
M  Build/zfile
M  Master/mfile
MM Build/afile
There are two spaces after the M on the first two lines (and no trailing
spaces on any line).  I was trying to sort on the second "field".

I run
  LC_ALL=en_US.UTF-8 sort --debug -k 2 /tmp/foo  # or -k 2,2 et al.
And get the nicely explanatory output for the "surprising" result:
  sort: using ‘en_US.UTF-8’ sorting rules
  sort: leading blanks are significant in key 1; consider also specifying 'b'
  MM Build/afile
  ...

However, if I run that same command in the C locale:
  LC_ALL=C sort --debug -k 2 /tmp/foo  # or -k 2,2 et al.
the output lacks that crucial commentary line:
  sort: leading blanks are significant ...

But the information is just as valid in C as in UTF-8, so far as I can
see.  Thus it would be nice for it to be present.

It would also be nice if the definition of "key 1" was stated.
Awfully easy to misread that as "field 1".

More importantly, I urge that the documentation for sort give an example
of this.  The idea that following blanks after the first become part of
the next field is highly counter-intuitive.  The information is
implicitly there in "non-blank to blank transition", but it is a common
confounding of expectations and deserves explicit mention, IMHO.  (If
it's there, sorry, I didn't see it.)

This is with coreutils 8.25 (from original source).

Thanks,
Karl



Reply via email to