On 20 May 2017 at 17:36, Steve Dower <steve.do...@python.org> wrote:
> In general, since most subprocesses (at least on Windows) do not have
> customizable encodings, the tool that launches them needs to know what the
> encoding is. Since we don't live in a Python 3.6 world quite yet, that means
> the tool should read raw bytes from the compiler and encode them to UTF-8.

Did you spot my point that Visual C produces output that's a mixture
of OEM and ANSI codepages?

The example I used was:

OEM code page 850, ANSI codepage 1252 (standard British English Windows)

Visual Studio 2015

cl a£b >output.file

The output uses CP850 (in the cl error message) and CP1252 (in the
link error) for the £ sign.

When run from the command line without redirection, the output is in a
consistent encoding. It's only when you redirect the output (I
redirected to a file, I assume a pipe would be the same) that you get
the problem.

I'd be very surprised if build tool developers got this sort of edge
case correct without at least some guidance from the PEP on the sorts
of things they need to consider. You suggest "read raw bytes and
encode them to UTF-8" - but you don't encode bytes, you encode
strings, so you still need to convert those bytes to a string first,
and there's no encoding you can reliably use for this. You need to use
"errors=replace" to ensure you can handle inconsistently encoded data
without getting an exception.

Paul
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to