On Thu, Feb 07, 2019 at 02:40:06PM +0000, Simon McVittie wrote:
> On Thu, 07 Feb 2019 at 14:05:33 +0100, Adam Borowski wrote:
> > a locale for a silly country with weird customs
> 
> Please don't take this tone. Insulting people who disagree with you[1]
> is rarely an effective way to persuade them that you're right and
> they're wrong.

I don't quite see how speech peppered with words like "imperialism" could be
taken seriously as insults, aside from bad-old-days soviet propaganda.

If I still didn't mark the tone as in jest enough, then apologies.

> > • promoting C.UTF-8 in our user interfaces (allowing to select it in d-i,
> >   making dpkg-reconfigure locales DTRT, making it the d-i default)
> 
> I think this is exactly the "international/culture-neutral English"
> locale that you're looking for.

Yeah.

> (Well, the C/POSIX locale is the formally
> standardized form of that, but breaks text outside the ASCII range;
> C.UTF-8 is the C locale with Unicode support added.)

Not really -- behaviour of C/POSIX for characters above 126 is _undefined_.

That locale is defined in a weird convoluted way designed to allow both
ASCII and IBM's encryption standards (aka variants of EBCDIC).

The only way I found so far that our current C.UTF-8 fails POSIX's demands
for "C" is:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html
7.3.1 LC_CTYPE
blank
        # In the POSIX locale, only the <space> and <tab> shall be included.


Another point is that setlocale(..., "") if the env vars are unset is
implementation-defined.  I'd change it to result not in "C" but in C.UTF-8.

> > • inventing a new locale "en" without a country bias
> >   -- good in the long term but problematic a month before freeze
> 
> I assume this would be a UTF-8 locale like en_US.utf8 and en_GB.utf8,
> so probably en.utf8, possibly with a simple "en" alias?

Yeah, with a non-US time and date format.  Possibly also collation where a
space is not ignored -- ie, dictionary order common to most of the world but
not the US -- "foo xxx" < "foobar".  C.* does this, en_US.* does not.  Even
worse, en_US ignores all (or most) non-letters, inconsistently with other
operating systems and libcs:

glibc:

0 9
0.9.0
0.9.0-a0-foo-bar
({---=[ 0.9.0-a11 ]=---})
0.9.0-a17-quux
(0.9.0-a2)
0.9.0+a99-1
0.9.0-rc1
0.9.1
0 9 9
({---=[ 0.9-a11 ]=---})
0.9 ab

Windows, musl, ...:

(0.9.0-a2)
({---=[ 0.9.0-a11 ]=---})
({---=[ 0.9-a11 ]=---})
0 9
0 9 9
0.9 ab
0.9.0
0.9.0+a99-1
0.9.0-a0-foo-bar
0.9.0-a17-quux
0.9.0-rc1
0.9.1


> As you say, I don't think a country-neutral specifically-English locale
> is going to happen before buster.

On the other hand, adding it but not using by default would probably be a
very good idea: in the future, it'd avoid situations where ssh-ing from one
machine to one running stable would have the default locale fail.

> How would this locale differ from C.UTF-8? Is the only difference
> that C.UTF-8 has strict lexicographical sorting, whereas "en" would have
> case-insensitive sorting like en_GB.utf8 does? (If that's the only
> difference, then perhaps something like "LANG=C.utf8 LC_COLLATE=en_US.utf8"
> is enough.)

I can't recall any other difference out of the top of my head, yeah.
LC_COLLATE=en_US.UTF-8 has that ignoring space nastiness, though.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄⠀⠀⠀⠀

Reply via email to