Re: Displaying characters in user's locale

Gavin Smith Mon, 03 Feb 2014 13:31:36 -0800

On Sun, Feb 2, 2014 at 10:16 PM, Karl Berry <[email protected]> wrote:
>
>     * Default encoding is set as UTF-8 - decide whether this is desired
> All I can think of to base the default on the current locale, because
> that's the only information we've got about what the user desires.
> E.g., if the locale is "C" (or, equivalently, "POSIX", of course), the
> target should be plain 7-bit ASCII.  If the locale is *.UTF-8, then the
> target should be UTF-8.  Etc.  (I don't know all the locale names used
> in this context, and can't find anything that seems like a comprehensive
> list, although it must be out there somewhere.)


Default file encoding set to UTF-8, that is, not output encoding -
output encoding is set from the locale. I would think that we should
leave files as they are if we don't know their encoding - that way we
don't risk breaking something that works already.

On the subject of interpreting ISO-8859 text as UTF-8 and passing
through any unrecognized byte sequences, I think Per Bothner is right
that this could fail. The problem is less of a problem because there
is a gap in the encoding from code points 80 to 9f, so a byte sequence
like 110xxxxx 10yyyyyy could only be incorrectly interpreted as UTF-8
if the second byte was in the range a0 to bf, that is there are 32
characters we could lose, which might not be used much anyway in
existing info files. I'd like some better evidence it wouldn't be a
problem, though.

Re: Displaying characters in user's locale

Reply via email to