Note: I've started collating the feedback from the thread at https://github.com/python/peps/issues/171
On 4 January 2017 at 01:56, Barry Warsaw <ba...@python.org> wrote: > A question and a suggestion. > > On Jan 03, 2017, at 04:00 PM, Nick Coghlan wrote: > >>* in Py_Initialize, emit a warning on stderr regarding limited Unicode >>compatibility if we detect that LC_CTYPE is set to the "C" locale > > So just to be clear, you propose only to check for exactly the "C" locale? > For example, my default locale is en_US.UTF-8 which would not trigger the > warning. I wouldn't want it to warn on any .UTF-8 locale since those should > be fine too. (I.e. it's just C locale's implicit ASCII that's the problem.) It's explicitly checking for whether or not the result of "setlocale(LC_CTYPE, NULL)" is the exact string "C", as that's what you get in the cases of interest (i.e. no locale configured, or the configured locale doesn't exist on the current system) >>* in Programs/python.c (i.e. the C level main() implementation), set LANG >>and LC_ALL in the environment to "C.UTF-8" if we detect that the locale is >>otherwise set to "C" >>* skip the coercion if PYTHONALLOWCLOCALE is set so developers running in >>recent system Python versions with this implemented can still debug >>problems that only show up in older Python 3.x releases, or in embedding >>applications that still use the C locale > > I have nits to pick about the envar name and warning text. > > I understand the desire to have a positive setting affect this but it feels > more like PYTHONCOERCECLOCALE=0 would be a more descriptive name and setting. That could be done (checking for the exact string "0", the same way we do for PYTHONHTTPSVERIFY in PEP 493). > That could be problematic because it doesn't allow any value; > i.e. PYTHONCOERCECLOCALE=1 wouldn't make sense to disable locale coercion. I > think my unease about the name stems from potential misunderstandings about C > vs. C.UTF-8, but maybe I'm just worried about a non-problem. Consider this a > challenge for a better envar name... or a bikeshed to ignore. :) It's a fair concern, as I believe the C and C.UTF-8 locales are the same aside from the default text encoding. The proposal is essentially to coerce C.ASCII to C.UTF-8 as we've collectively found the former to be nigh-unusable in practice. The more I think about it, the more I like the suggested change, as it means the verb used in the environment variable ("coerce") matches the one in the warning ("coercing"), rather than relying on folks realising that "allow" is the opposite of "coerce" in this context. > On to the warnings: > > When Py_Initialize is called and CPython detects that the configured > locale is the default C locale, the following warning will be issued: > > Py_Initialize detected LC_CTYPE=C, which limits Unicode > compatibility. Some libraries and operating system interfaces may not work > correctly. Set `PYTHONALLOWCLOCALE=1 LC_CTYPE=C` to configure a similar > environment when running Python directly. > > I find this confusing on several fronts. I think it might be better to say > "Embedded Python" rather than "Py_Initialize" since end users who are using an > application with Python embedded probably will have no idea what > "Py_Initialize" is, and they are the ones who will see this warning first. I avoided the term "embedded", as I think it would be confusing when locale coercion is disabled for the main Python CLI app. > It > also feels odd to provide instructions on how to reproduce this in `python` > cli from the embedded warning. That was a request from some of the Fedora folks, as many of the developers encountering this warning are expected to be software maintenance engineers that will want to reproduce integration issues in a standalone Python runtime. However, I agree it reads strangely, and its arguably redundant given the locale coercion warning when running the main Python CLI app. So I'll drop it from the upstream PEP, and if we decide we really want it for the Fedora system Python, we can tweak the wording in a downstream patch. > It also doesn't say that the locale is being > coerced. The embedded runtime *doesn't* do any locale coercion itself - by the time it runs, it's too late to change the locale, so it just complains without doing anything about it. > What about: > > Embedded Python detected LC_CTYPE=C (a locale with default ASCII > encoding), which may cause Unicode compatibility problems. Coercing the > locale to C.UTF-8. Set the environment variable PYTHONALLOWCLOCALE=1 to > prevent this coercion. Given my above comments, this warning would end up looking something like: Python runtime initialized with LC_CTYPE=C (a locale with default ASCII encoding), which may cause Unicode compatibility problems. Configuring C.UTF-8 as a Unicode-compatible alternative locale is recommended. > If C.UTF-8 isn't available, then the warning would read: > > Embedded Python detected LC_CTYPE=C (a locale with default ASCII > encoding), which may cause some Unicode compatibility problems. Coercion > to C.UTF-8 locale is not possible. Set the environment variable > PYTHONALLOWCLOCALE=1 to suppress this warning. Hmm, I hadn't accounted for the fact that the CLI can actually tell whether or not the coercion to C.UTF-8 worked (as 'setlocale(LC_ALL, "")' will return NULL if the configured locale doesn't exist). That means we can try C.UTF-8 first, and then fall back to en_US.UTF-8 (which would be sufficient to get CentOS and RHEL 5/6/7 working automatically, and likely a lot of other distros as well), before finally giving up and letting the "C" default stand. > I'd use the same text for `python` as cli except s/Embedded Python/Python/ If you missed it, I think I need to better highlight in the PEP that the library does not, and cannot, coerce the locale to C.UTF-8: Py_Initialize runs too late in the startup process for that to work they way we would want it to. The changes needs to incorporate the config flags and preprocessors definitions discussed below should help with that. > I also think there should be a compile-time or run-time flag that embedders > could set so that they could explicitly disable the warning or coercion. > Something like ASCIILOCALEISFINEANDYESIKNOWWHATIAMDOINGSOSTFU=1 Ugh, M4 macros :) But yeah, that's a good idea. Since the runtime initialization warning and the CLI locale coercion are technically independent, what do you think about adding two flags: * --with[out]-c-locale-coercion (setting PY_COERCE_C_LOCALE for the CLI behaviour) * --with[out]-c-locale-warning (setting PY_WARN_ON_C_LOCALE for the runtime initialization behaviour) >>* grant a priori permission to redistributors to backport this to older >>versions (as we'd like to include the change in the Fedora system Python >>for F26, which will be based on Python 3.6.0) > > I think that's fine, but I doubt we'll need it for Debian and derivatives. If more people were in the habit of setting sensible locales in their Docker base images, I doubt I would be bothered about it for Fedora et al either. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Linux-sig mailing list Linux-sig@python.org https://mail.python.org/mailman/listinfo/linux-sig