Re: How to sort unicode properly?

Eric Blake Wed, 25 Sep 2019 10:24:56 -0700

On 9/25/19 10:56 AM, Peng Yu wrote:

I want to make my `sort` to be machine-independent and always use the
correct Unicode sort order. Is there a way to do so?

Those two goals are somewhat at odds. The only truly portablemachine-independent sorting is the one guaranteed by POSIX when you useLC_ALL=C (fun fact: even on an EBCDIC machine, that is required by POSIXto collate in ASCII order, rather than native byte order). The momentyou use any other locale, then you not only left to the mercies ofwhoever wrote that locale, but also stuck with the fact that there is noportable way to transfer locale definitions from one vendor's libc toanother.


I don't know how to check where en_US.UTF-8 comes from. Do you know
how to check it? (I use Mac OS X.)

All other locales are somewhat vendor-dependent; as you've discovered,your vendor (Apple) has a rather gaping hole in their locale support.But because Apple is a closed-source shop, it will have to be Apple thatfixes their bug, unless you want to take on the gargantuan task ofwriting a gnulib module that provides locale tables to mirror glibc foruse on non-glibc machines.


Note that glibc doesn't have that problem, at least on my system:

$ cat /etc/fedora-release
Fedora release 30 (Thirty)
$ rpm -q glibc
glibc-2.29-22.fc30.x86_64
$ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8  sort --debug
sort: text ordering performed using ‘en_US.UTF-8’ sorting rules
cafe
____
café
____
caff
____

So one option you could pursue is switching to an operating system thatdoes not curtail your freedoms.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: How to sort unicode properly?

Reply via email to