On 21 May 2017 at 02:36, Steve Dower <steve.do...@python.org> wrote: > On 20May2017 0820, Nick Coghlan wrote: >> >> Good point regarding the fact that the Windows 16-bit APIs only come >> into play for interactive sessions (even in 3.6+), while for PEP 517 >> we're specifically interested in the 8-bit pipes used to communicate >> with build subprocesses launched by an installation tool. > > > I need to catch up on the PEP (and thanks Brett for alerting me to the > thread), but this comment in particular cements the mental diagram I have > right now: > > (build UI) <--> (build tool) <--> (compiler) > ( Python ) <--> ( Python ) <--> (anything) > > I'll probably read the PEP closely and see that this is entirely incorrect, > but if it's right: > > * encoding for text between the build UI and build tool should just be > specified once for all platforms (i.e. use UTF-8). > * encoding for text between build tool and the compiler depends on the > compiler
Alas, it isn't quite that simple. Let's take the current de facto standard case: (user console/CI build log) <-> pip <-> setup.py (distutils/setuptools) <-> 3rd party tool Key usability feature: * when requested, informational messages from 3rd party tools SHOULD be made available to the end user for debugging purposes Ideal outcome: * everything that makes it to the user console or CI build log is readable by the end user Essential requirement: * encoding problems in informational messages emitted by 3rd party tools MUST NOT cause the build to fail Now, the easiest way to handle the essential requirement as the author of an installation or build tool is to choose not to deal with it: instead, you just treat the output from further downstream as opaque binary data, and let the user console/CI build log layer deal with any encoding problems as they see fit. You may end up with some build failures that are a pain to debug because you're getting nonsense from the build pipeline, but you won't fail your build *because* some particular build tool emitted improperly encoded nonsense. That all changes if we *require* UTF-8 on the link between the installation tool (e.g. pip) and the build tool (e.g. setup.py). If we do that: * the installation tool can't just pass along build tool output to the user console or CI build log any more, it has a nominal obligation to try to interpret it as UTF-8 * the build tool (or build tool shim) can't just pass along 3rd party tool output to the installation tool any more, it has a nominal obligation to try to get it to emit UTF-8 Now, *particular* installation and build tools may want to strongly encourage the use of UTF-8 in an effort to get closer to the ideal outcome, but that isn't the key objective of PEP 517: the key objective of PEP 517 is to make it easier to use *general purpose* build systems that happen to be implemented in Python (like waf, scons, and meson) to handle complex build scenarios, while also allowing the use of simpler Python-only build systems (like flit) for distribution of pure Python projects. That said, the PEP *could* explicitly define a short list of behaviours that we consider reasonable in an installation tool: 1. Treat the informational output from the build tool as an opaque binary stream 2. Treat the informational output from the build tool as a text stream encoded using locale.getpreferredencoding(), and decode it using the backslashreplace error handler 3. Treat the informational output from the build tool as a UTF-8 encoded text stream, and decode it using the backslashreplace error handler We'd just need to caveat the latter two options with the fact that they'll give you a cryptic error message on Python 3.4 and earlier (including Python 2): >>> b"\xf0\x01\x02\x03".decode("utf-8", "backslashreplace") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ncoghlan/devel/py27/Lib/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) TypeError: don't know how to handle UnicodeDecodeError in error callback I had to look that up on Stack Overflow myself, but what it's trying to say is that until Python 3.5, "backslashreplace" only worked for encoding, not for decoding. That means that for earlier versions, you'd need to define your own custom error handler as described in http://stackoverflow.com/questions/25442954/how-should-i-decode-bytes-using-ascii-without-losing-any-junk-bytes-if-xmlch/25443356#25443356 Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig