On Mon, May 22, 2017, at 12:02 PM, Paul Moore wrote:
> The only reservation I have is that the choice of UTF-8 means that on
> Windows, build backends pretty much have to explicitly manage tool
> output (as they are pretty much certain *not* to output in UTF-8).
> Build backend writers that aren't aware of this issue (most likely
> because their main platform is not Windows) could very easily choose
> to just pass through the raw bytes, and as a result *all* non-ASCII
> output would be garbled on non-UTF-8 systems.
> 
> Would locale.getpreferredencoding() not be a better choice here? I
> know it has issues in some situations on Unix, but are they worse than
> the issues UTF-8 would cause on Windows? After all it's the encoding
> used by subprocess.Popen in "universal newlines" mode...

What if it wants to send a character which can't be encoded in the
locale encoding? It's quite easy on Windows to end up with a character
that you can't encode as cp1252. If the build tool uses .encode(loc_enc,
'replace'), then you've lost information even before it gets to the
install tool.

It's 2017, I really don't want to go down the 'locale specified
encoding' route again. UTF-8 everywhere!

One affordance I'd consider is a recommendation to install tools that if
captured output is not valid UTF-8, they dump the raw bytes to a file so
that no information is lost. I'm not sure if that recommendation needs
to be in the spec itself, though.

Thomas
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to