On 22 May 2017 at 10:56, Thomas Kluyver <tho...@kluyver.me.uk> wrote:
> On Sat, May 20, 2017, at 07:36 PM, Steve Dower wrote:
>> Require that build tools either send UTF-8 to the UI component, or write
>> bytes to a file and call it a build output. I see no benefit in
>> requiring both the build tool and the UI tool to guess what the text
>> encoding is.
>
> I'm not proposing that the install tool should try to guess the
> encoding, but I think a well written install tool shouldn't crash if the
> build output doesn't match the encoding it expects. Even if the spec
> says that the build output MUST be UTF-8 encoded, build tools can have
> bugs, and you don't want want the install to fail just because the log
> isn't correctly encoded.
>
> Hence, I think a 'SHOULD' is appropriate for this part of the spec:
>
> - To install tool authors, it is clear that they can display the output
> as UTF-8 so long as they don't crash if it's invalid.
> - To build tool authors, it's clear that they can't pass the buck to
> install tool authors if output gets jumbled because it's not UTF-8.

I'd say that it's not so much just "well written" install tools. I'd
say that install tools MUST NOT crash if build tool output isn't in
the expected encoding. On the other hand, the encoding agreement
implies that if build tools *do* send data in the correct encoding
then they are entitled to expect that it will be displayed accurately
to the end user.

Output can be garbled in two ways:

1. The build tool does not (or cannot) ensure that its output is in
the standard-mandated encoding.
2. The install tool cannot display the full range of characters
representable in the standard-mandated encoding.

Neither of these should cause a failure. Well written install tools
should warn in the case of (1) - "I have been passed data that I don't
understand, I'll do my best to display it but can't guarantee the
output won't be garbled". In the case of (2), though, that's "as
expected" - if your OS settings mean you can't display certain
characters, you shouldn't be surprised if your install tool replaces
them with a placeholder.

On an implementation note, this boils down to something like the
following in the install tool:

    # Step 1
    try:
        data = decode build output using STD_ENCODING
    except UnicodeDecodeError:
        warn "Data is not in expected encoding"
        data = decode using STD_ENCODING with errors=<some form of replacement>

    # Step 2
    data = data.encode(MY_OUTPUT_ENCODING, errors=<some form of
replacement>).decode(MY_OUTPUT_ENCODING)

    # We now have subprocess output that's safe to display if requested.

As a side note, I find step 2 "sanitise my string to ensure it can be
safely output" to be a pretty common operation - possibly because
Python's standard IO streams raise exceptions on unicode errors - and
I'm surprised there isn't a better way to spell it than the
encode/decode pair above.

Paul
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to