On 20 May 2017 at 17:36, Steve Dower <steve.do...@python.org> wrote: > In general, since most subprocesses (at least on Windows) do not have > customizable encodings, the tool that launches them needs to know what the > encoding is. Since we don't live in a Python 3.6 world quite yet, that means > the tool should read raw bytes from the compiler and encode them to UTF-8.
Did you spot my point that Visual C produces output that's a mixture of OEM and ANSI codepages? The example I used was: OEM code page 850, ANSI codepage 1252 (standard British English Windows) Visual Studio 2015 cl a£b >output.file The output uses CP850 (in the cl error message) and CP1252 (in the link error) for the £ sign. When run from the command line without redirection, the output is in a consistent encoding. It's only when you redirect the output (I redirected to a file, I assume a pipe would be the same) that you get the problem. I'd be very surprised if build tool developers got this sort of edge case correct without at least some guidance from the PEP on the sorts of things they need to consider. You suggest "read raw bytes and encode them to UTF-8" - but you don't encode bytes, you encode strings, so you still need to convert those bytes to a string first, and there's no encoding you can reliably use for this. You need to use "errors=replace" to ensure you can handle inconsistently encoded data without getting an exception. Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig