Note: I've started collating the feedback from the thread at
https://github.com/python/peps/issues/171

On 4 January 2017 at 01:56, Barry Warsaw <ba...@python.org> wrote:
> A question and a suggestion.
>
> On Jan 03, 2017, at 04:00 PM, Nick Coghlan wrote:
>
>>* in Py_Initialize, emit a warning on stderr regarding limited Unicode
>>compatibility if we detect that LC_CTYPE is set to the "C" locale
>
> So just to be clear, you propose only to check for exactly the "C" locale?
> For example, my default locale is en_US.UTF-8 which would not trigger the
> warning.  I wouldn't want it to warn on any .UTF-8 locale since those should
> be fine too.  (I.e. it's just C locale's implicit ASCII that's the problem.)

It's explicitly checking for whether or not the result of
"setlocale(LC_CTYPE, NULL)" is the exact string "C", as that's what
you get in the cases of interest (i.e. no locale configured, or the
configured locale doesn't exist on the current system)

>>* in Programs/python.c (i.e. the C level main() implementation), set LANG
>>and LC_ALL in the environment to "C.UTF-8" if we detect that the locale is
>>otherwise set to "C"
>>* skip the coercion if PYTHONALLOWCLOCALE is set so developers running in
>>recent system Python versions with this implemented can still debug
>>problems that only show up in older Python 3.x releases, or in embedding
>>applications that still use the C locale
>
> I have nits to pick about the envar name and warning text.
>
> I understand the desire to have a positive setting affect this but it feels
> more like PYTHONCOERCECLOCALE=0 would be a more descriptive name and setting.

That could be done (checking for the exact string "0", the same way we
do for PYTHONHTTPSVERIFY in PEP 493).

> That could be problematic because it doesn't allow any value;
> i.e. PYTHONCOERCECLOCALE=1 wouldn't make sense to disable locale coercion.  I
> think my unease about the name stems from potential misunderstandings about C
> vs. C.UTF-8, but maybe I'm just worried about a non-problem.  Consider this a
> challenge for a better envar name... or a bikeshed to ignore. :)

It's a fair concern, as I believe the C and C.UTF-8 locales are the
same aside from the default text encoding. The proposal is essentially
to coerce C.ASCII to C.UTF-8 as we've collectively found the former to
be nigh-unusable in practice.

The more I think about it, the more I like the suggested change, as it
means the verb used in the environment variable ("coerce") matches the
one in the warning ("coercing"), rather than relying on folks
realising that "allow" is the opposite of "coerce" in this context.

> On to the warnings:
>
>     When Py_Initialize is called and CPython detects that the configured
>     locale is the default C locale, the following warning will be issued:
>
>     Py_Initialize detected LC_CTYPE=C, which limits Unicode
>     compatibility. Some libraries and operating system interfaces may not work
>     correctly. Set `PYTHONALLOWCLOCALE=1 LC_CTYPE=C` to configure a similar
>     environment when running Python directly.
>
> I find this confusing on several fronts.  I think it might be better to say
> "Embedded Python" rather than "Py_Initialize" since end users who are using an
> application with Python embedded probably will have no idea what
> "Py_Initialize" is, and they are the ones who will see this warning first.

I avoided the term "embedded", as I think it would be confusing when
locale coercion is disabled for the main Python CLI app.

>  It
> also feels odd to provide instructions on how to reproduce this in `python`
> cli from the embedded warning.

That was a request from some of the Fedora folks, as many of the
developers encountering this warning are expected to be software
maintenance engineers that will want to reproduce integration issues
in a standalone Python runtime.

However, I agree it reads strangely, and its arguably redundant given
the locale coercion warning when running the main Python CLI app. So
I'll drop it from the upstream PEP, and if we decide we really want it
for the Fedora system Python, we can tweak the wording in a downstream
patch.

> It also doesn't say that the locale is being
> coerced.

The embedded runtime *doesn't* do any locale coercion itself - by the
time it runs, it's too late to change the locale, so it just complains
without doing anything about it.

>  What about:
>
>     Embedded Python detected LC_CTYPE=C (a locale with default ASCII
>     encoding), which may cause Unicode compatibility problems.  Coercing the
>     locale to C.UTF-8.  Set the environment variable PYTHONALLOWCLOCALE=1 to
>     prevent this coercion.

Given my above comments, this warning would end up looking something like:

    Python runtime initialized with LC_CTYPE=C (a locale with default
ASCII encoding), which may cause Unicode compatibility problems.
Configuring C.UTF-8 as a Unicode-compatible alternative locale is
recommended.

> If C.UTF-8 isn't available, then the warning would read:
>
>     Embedded Python detected LC_CTYPE=C (a locale with default ASCII
>     encoding), which may cause some Unicode compatibility problems.  Coercion
>     to C.UTF-8 locale is not possible.  Set the environment variable
>     PYTHONALLOWCLOCALE=1 to suppress this warning.

Hmm, I hadn't accounted for the fact that the CLI can actually tell
whether or not the coercion to C.UTF-8 worked (as 'setlocale(LC_ALL,
"")' will return NULL if the configured locale doesn't exist). That
means we can try C.UTF-8 first, and then fall back to en_US.UTF-8
(which would be sufficient to get CentOS and RHEL 5/6/7 working
automatically, and likely a lot of other distros as well), before
finally giving up and letting the "C" default stand.

> I'd use the same text for `python` as cli except s/Embedded Python/Python/

If you missed it, I think I need to better highlight in the PEP that
the library does not, and cannot, coerce the locale to C.UTF-8:
Py_Initialize runs too late in the startup process for that to work
they way we would want it to.

The changes needs to incorporate the config flags and preprocessors
definitions discussed below should help with that.

> I also think there should be a compile-time or run-time flag that embedders
> could set so that they could explicitly disable the warning or coercion.
> Something like ASCIILOCALEISFINEANDYESIKNOWWHATIAMDOINGSOSTFU=1

Ugh, M4 macros :)

But yeah, that's a good idea. Since the runtime initialization warning
and the CLI locale coercion are technically independent, what do you
think about adding two flags:

* --with[out]-c-locale-coercion (setting PY_COERCE_C_LOCALE for the
CLI behaviour)
* --with[out]-c-locale-warning (setting PY_WARN_ON_C_LOCALE for the
runtime initialization behaviour)

>>* grant a priori permission to redistributors to backport this to older
>>versions (as we'd like to include the change in the Fedora system Python
>>for F26, which will be based on Python 3.6.0)
>
> I think that's fine, but I doubt we'll need it for Debian and derivatives.

If more people were in the habit of setting sensible locales in their
Docker base images, I doubt I would be bothered about it for Fedora et
al either.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
_______________________________________________
Linux-sig mailing list
Linux-sig@python.org
https://mail.python.org/mailman/listinfo/linux-sig

Reply via email to