On 04/08/2013 01:27 AM, Ray Dillinger wrote: > It turns out that 'sort' is grabbing locale information now and doing a > locale- > aware sort
Yes, this behavior has been required by POSIX for more than 20 years, now (POSIX 1003.2-1992 was the first document that standardized this behavior, and it standardized what was already existing practice at that time). > (hence failing to treat different lengths of blankspace > differently > and failing to treat any punctuation characters as significant -- at > least in my > case). Yes, this is one of the effects of 'sort' being required to do locale-aware sorting. In fact, it is a FAQ: https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021 > There is a workaround; one can set the locale to 'C' or 'POSIX' directly > in a > script (or at the shell prompt) and then set it back after calling > 'sort'. That is not just a workaround, but the POSIX-mandated way to get sane sorting results. Script writers have been doing this for years. > But I > dislike that workaround firstly because it complicates the writing of > scripts > adding boilerplate in many scripts that could be added instead just in > 'sort' > itself, secondly because I don't want to be mucking around with the locale > from the command line, thirdly because that means people with other > locales can't get error messages etc in their own languages if they're > using > a simplified sort, and fourth because there are too many ways it can fail. You can still get error messages in a language you want, while still collating in the C locale, by setting LC_COLLATE=C and leaving LC_ALL unset. But as to your dislike in using locale environment variables for their intended purpose, you'll just have to get over that the way other script writers have learned to do. > So I decided it would be cleaner to hack a new command line option into > 'sort' itself to explicitly invoke the simple traditional sorting behavior. > Since 'c' and 'C' are already taken, I used the 'POSIX' locale instead > of the > 'C' locale, and gave it short option '-P' and long option '--posix-simple', > with help string 'use POSIX locale (simple byte-value) comparisons.' Thanks for trying to write a patch. However, it is unlikely that we will apply the patch, because the existing POSIX mandated-use of LANG/LC_COLLATE/LC_ALL sufficiently exposes the knob in a portable manner, while your option would only appear in GNU coreutils (and even then, it would take a couple years before it hits all the distros you are likely to use), and teaching people to rely on non-portable extensions when a portable solution already exists is a bit counterproductive. > The diff is against the Debian distribution's coreutils-8.13 source code, We prefer diffs against the latest coreutils.git, as the sources for sort.c have changed since the last Debian release. > I have attached the diff file. Your diff file was in 'ed script' form (diff without options). This form is useless if the source has changed since when your patch was written, since it contains no context on which lines were intended to be changed. Also, you attached the entire body of sort.c, which doesn't really help us. We prefer patches in unified form (diff -u), and can also use patches in context form (diff -c), and with no repeat of sort.c. Read HACKING for more details on the preferred way to supply a patch. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
