I have made a PR against the PEP with my best take on the encoding situation: https://github.com/python/peps/pull/264/files
On Mon, May 22, 2017, at 11:19 AM, Paul Moore wrote: > On 22 May 2017 at 10:56, Thomas Kluyver <tho...@kluyver.me.uk> wrote: > > On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote: > >> Require that build tools either send UTF-8 to the UI component, or write > >> bytes to a file and call it a build output. I see no benefit in > >> requiring both the build tool and the UI tool to guess what the text > >> encoding is. > > > > I'm not proposing that the install tool should try to guess the > > encoding, but I think a well written install tool shouldn't crash if the > > build output doesn't match the encoding it expects. Even if the spec > > says that the build output MUST be UTF-8 encoded, build tools can have > > bugs, and you don't want want the install to fail just because the log > > isn't correctly encoded. > > > > Hence, I think a 'SHOULD' is appropriate for this part of the spec: > > > > - To install tool authors, it is clear that they can display the output > > as UTF-8 so long as they don't crash if it's invalid. > > - To build tool authors, it's clear that they can't pass the buck to > > install tool authors if output gets jumbled because it's not UTF-8. > > I'd say that it's not so much just "well written" install tools. I'd > say that install tools MUST NOT crash if build tool output isn't in > the expected encoding. On the other hand, the encoding agreement > implies that if build tools *do* send data in the correct encoding > then they are entitled to expect that it will be displayed accurately > to the end user. > > Output can be garbled in two ways: > > 1. The build tool does not (or cannot) ensure that its output is in > the standard-mandated encoding. > 2. The install tool cannot display the full range of characters > representable in the standard-mandated encoding. > > Neither of these should cause a failure. Well written install tools > should warn in the case of (1) - "I have been passed data that I don't > understand, I'll do my best to display it but can't guarantee the > output won't be garbled". In the case of (2), though, that's "as > expected" - if your OS settings mean you can't display certain > characters, you shouldn't be surprised if your install tool replaces > them with a placeholder. > > On an implementation note, this boils down to something like the > following in the install tool: > > # Step 1 > try: > data = decode build output using STD_ENCODING > except UnicodeDecodeError: > warn "Data is not in expected encoding" > data = decode using STD_ENCODING with errors=<some form of > replacement> > > # Step 2 > data = data.encode(MY_OUTPUT_ENCODING, errors=<some form of > replacement>).decode(MY_OUTPUT_ENCODING) > > # We now have subprocess output that's safe to display if requested. > > As a side note, I find step 2 "sanitise my string to ensure it can be > safely output" to be a pretty common operation - possibly because > Python's standard IO streams raise exceptions on unicode errors - and > I'm surprised there isn't a better way to spell it than the > encode/decode pair above. > > Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig