On Thu, Feb 07, 2019 at 02:40:06PM +0000, Simon McVittie wrote: > On Thu, 07 Feb 2019 at 14:05:33 +0100, Adam Borowski wrote: > > a locale for a silly country with weird customs > > Please don't take this tone. Insulting people who disagree with you[1] > is rarely an effective way to persuade them that you're right and > they're wrong.
I don't quite see how speech peppered with words like "imperialism" could be taken seriously as insults, aside from bad-old-days soviet propaganda. If I still didn't mark the tone as in jest enough, then apologies. > > • promoting C.UTF-8 in our user interfaces (allowing to select it in d-i, > > making dpkg-reconfigure locales DTRT, making it the d-i default) > > I think this is exactly the "international/culture-neutral English" > locale that you're looking for. Yeah. > (Well, the C/POSIX locale is the formally > standardized form of that, but breaks text outside the ASCII range; > C.UTF-8 is the C locale with Unicode support added.) Not really -- behaviour of C/POSIX for characters above 126 is _undefined_. That locale is defined in a weird convoluted way designed to allow both ASCII and IBM's encryption standards (aka variants of EBCDIC). The only way I found so far that our current C.UTF-8 fails POSIX's demands for "C" is: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html 7.3.1 LC_CTYPE blank # In the POSIX locale, only the <space> and <tab> shall be included. Another point is that setlocale(..., "") if the env vars are unset is implementation-defined. I'd change it to result not in "C" but in C.UTF-8. > > • inventing a new locale "en" without a country bias > > -- good in the long term but problematic a month before freeze > > I assume this would be a UTF-8 locale like en_US.utf8 and en_GB.utf8, > so probably en.utf8, possibly with a simple "en" alias? Yeah, with a non-US time and date format. Possibly also collation where a space is not ignored -- ie, dictionary order common to most of the world but not the US -- "foo xxx" < "foobar". C.* does this, en_US.* does not. Even worse, en_US ignores all (or most) non-letters, inconsistently with other operating systems and libcs: glibc: 0 9 0.9.0 0.9.0-a0-foo-bar ({---=[ 0.9.0-a11 ]=---}) 0.9.0-a17-quux (0.9.0-a2) 0.9.0+a99-1 0.9.0-rc1 0.9.1 0 9 9 ({---=[ 0.9-a11 ]=---}) 0.9 ab Windows, musl, ...: (0.9.0-a2) ({---=[ 0.9.0-a11 ]=---}) ({---=[ 0.9-a11 ]=---}) 0 9 0 9 9 0.9 ab 0.9.0 0.9.0+a99-1 0.9.0-a0-foo-bar 0.9.0-a17-quux 0.9.0-rc1 0.9.1 > As you say, I don't think a country-neutral specifically-English locale > is going to happen before buster. On the other hand, adding it but not using by default would probably be a very good idea: in the future, it'd avoid situations where ssh-ing from one machine to one running stable would have the default locale fail. > How would this locale differ from C.UTF-8? Is the only difference > that C.UTF-8 has strict lexicographical sorting, whereas "en" would have > case-insensitive sorting like en_GB.utf8 does? (If that's the only > difference, then perhaps something like "LANG=C.utf8 LC_COLLATE=en_US.utf8" > is enough.) I can't recall any other difference out of the top of my head, yeah. LC_COLLATE=en_US.UTF-8 has that ignoring space nastiness, though. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄⠀⠀⠀⠀