On Sun, 28 Oct 2007 12:17:39 +0100 Bernd Eggink wrote: > Both $'\xe4' and $'\303\244' render as a-umlaut with LANG=de_DE.UTF-8. > Only $'\303\244' does so with LANG=C.
> I'm not quite convinced that this isn't a ksh issue... > Thanks anyway! > Confused, > Bernd confused here too you and I aren't getting off that easy part of the problem is that I'm usually in LANG=C and am oblivious to many of the locale subtleties so the following analysis could be off base please jump in and correct any errors in the logic I do know this much about utf-8 encoding the leftmost 1-bits in each utf-8 byte specify the number of current and remaining bytes to make up the encoded character for the one I specified: $ printf $'%..2u ' 0303 0244; print 11000011 10100100 2 1 for the one you specified, I'm guessing the 8-bit ascii a-umlaut, $ printf $'%..2u ' 0xe4; print 11100100 3 which for a utf-8 encoded app means "this utf-8 encoding takes up 3 bytes" but the app is only presented with 1 byte so there is an encoding error and all bets are off in particular I see '\xe4' as space, not a-umlaut is there a tty setting that says "if its not utf-8, try 8-bit ascii"? --Glenn _______________________________________________ ast-developers mailing list [email protected] https://mailman.research.att.com/mailman/listinfo/ast-developers
