Re: [SCM] GNU Autoconf source repository branch, master, updated. v2.65-35-ga2889ee

Eric Blake Wed, 03 Feb 2010 05:08:26 -0800

According to Paolo Bonzini on 2/3/2010 2:48 AM:
> On 02/01/2010 07:44 PM, Ralf Wildenhues wrote:
>> On my Cygwin, C means UTF-8. I suppose though that's just because
>> Cygwin changed recently.
> 
> That's a royally bad idea, since the only portable way to handle files
> that potentially contain invalid multibyte sequences, is to set the
> locale to C.


There was a HUGE thread on this topic on both the cygwin and Austin Group
mailing lists, which I don't want to repeat here.  If you want to complain
about cygwin's choice of locale, take it to the cygwin list.  That said:

Cygwin 1.7.1 defaults to C.UTF-8 in the absence of a specific request, and
treats C like C.UTF-8, but you can select a unibyte locale with C.ASCII.

The upcoming Cygwin 1.7.2 will continue to default to C.UTF-8, but will
treat C like C.ASCII.

POSIX states that the C locale can be used in any byte context for all 256
bytes.  But for character contexts, it can only portably used for
characters < 128.  Cygwin satisfies these rules, whether you use C.UTF-8
or C.ASCII (and therefore, whether you use C in cygwin 1.7.1 or C in
cygwin 1.7.2).  And any program that depends on the "C" locale providing
strictly unibyte encoding of characters is broken, per POSIX, so in a way,
cygwin 1.7.1 is doing a favor at helping root out non-portable programs.

-- 
Don't work too hard, make some time for fun as well!

Eric Blake             [email protected]

signature.asc
Description: OpenPGP digital signature

Re: [SCM] GNU Autoconf source repository branch, master, updated. v2.65-35-ga2889ee

Reply via email to