On Mon, May 22, 2017, at 12:02 PM, Paul Moore wrote: > The only reservation I have is that the choice of UTF-8 means that on > Windows, build backends pretty much have to explicitly manage tool > output (as they are pretty much certain *not* to output in UTF-8). > Build backend writers that aren't aware of this issue (most likely > because their main platform is not Windows) could very easily choose > to just pass through the raw bytes, and as a result *all* non-ASCII > output would be garbled on non-UTF-8 systems. > > Would locale.getpreferredencoding() not be a better choice here? I > know it has issues in some situations on Unix, but are they worse than > the issues UTF-8 would cause on Windows? After all it's the encoding > used by subprocess.Popen in "universal newlines" mode...
What if it wants to send a character which can't be encoded in the locale encoding? It's quite easy on Windows to end up with a character that you can't encode as cp1252. If the build tool uses .encode(loc_enc, 'replace'), then you've lost information even before it gets to the install tool. It's 2017, I really don't want to go down the 'locale specified encoding' route again. UTF-8 everywhere! One affordance I'd consider is a recommendation to install tools that if captured output is not valid UTF-8, they dump the raw bytes to a file so that no information is lost. I'm not sure if that recommendation needs to be in the spec itself, though. Thomas _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig