I want to make my `sort` to be machine-independent and always use the correct Unicode sort order. Is there a way to do so?
I don't know how to check where en_US.UTF-8 comes from. Do you know how to check it? (I use Mac OS X.) On 9/25/19, Eric Blake <ebl...@redhat.com> wrote: > On 9/25/19 10:20 AM, Peng Yu wrote: >> Hi, >> >> It seems that "café" should be sorted before "caff" in Unicode. >> >> https://github.com/jtauber/pyuca >> >> But `sort` does not do so. >> >> $ printf '%s\n' cafe caff café | LC_ALL=UTF8 sort >> cafe >> caff >> café >> $ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8 sort >> cafe >> caff >> café >> >> How to make `sort` sort according to Unicode order? Thanks. > > You'll have to write a locale definition where strcoll() sorts in the > order you want. Coreutils sort is calling strcoll(), and if it doesn't > sort the way you think it should, the bug is in your locale and not in > coreutils. You'll want to report this issue to whoever provided your > en_US.UTF-8 locale (perhaps glibc?) > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org > -- Regards, Peng