According to Paolo Bonzini on 2/3/2010 2:48 AM: > On 02/01/2010 07:44 PM, Ralf Wildenhues wrote: >> On my Cygwin, C means UTF-8. I suppose though that's just because >> Cygwin changed recently. > > That's a royally bad idea, since the only portable way to handle files > that potentially contain invalid multibyte sequences, is to set the > locale to C.
There was a HUGE thread on this topic on both the cygwin and Austin Group mailing lists, which I don't want to repeat here. If you want to complain about cygwin's choice of locale, take it to the cygwin list. That said: Cygwin 1.7.1 defaults to C.UTF-8 in the absence of a specific request, and treats C like C.UTF-8, but you can select a unibyte locale with C.ASCII. The upcoming Cygwin 1.7.2 will continue to default to C.UTF-8, but will treat C like C.ASCII. POSIX states that the C locale can be used in any byte context for all 256 bytes. But for character contexts, it can only portably used for characters < 128. Cygwin satisfies these rules, whether you use C.UTF-8 or C.ASCII (and therefore, whether you use C in cygwin 1.7.1 or C in cygwin 1.7.2). And any program that depends on the "C" locale providing strictly unibyte encoding of characters is broken, per POSIX, so in a way, cygwin 1.7.1 is doing a favor at helping root out non-portable programs. -- Don't work too hard, make some time for fun as well! Eric Blake [email protected]
signature.asc
Description: OpenPGP digital signature
