you had set LANG to *UTF-8* and that is a multibyte encoding $'\xe4' is not a valid utf-8 encoding the example I gave contained the utf-8 for s-tzet $'\303\237' the utf-8 for a-umlaut is $'\303\244'
od -tx1 shows that this message contains $'\xe4' but it doesn't render as a-umlaut -- does it render as a-umlaut with LANG=C for you? if so then there must be some window/terminal settings in play that are independent of $LANG / $LC_* I'm pretty sure both the env and term settings must cooperate in order to get consistent rendering i.e., both the program and the terminal must know that utf-8 is enabled On Sun, 28 Oct 2007 11:03:57 +0100 Bernd Eggink wrote: > Glenn Fowler schrieb: > > some of your characters did not translate properly for email > Here is a simpler example: > $ a=( name='ä' ) # lower case a umlaut (unicode=00E4) > $ print $a > ( name=ä ) # correct > $ print "$a" > ( > name==Ã > ) > Instead of the lower case a umlaut, an upper case A with tilde (unicode > 00C3) appears, and the equal is doubled. This happens on an X terminal > (urxvt) as well as on a normal Linux tty. > However, with the assignment > $ a=( name=$'\xe4' ) > both outputs look fine. I noticed that entering 'ä' (a umlaut) and $'\xe4' > produces the same character on the screen, but different byte sequences: > $ print 'ä' | od -x > 0000000 a4c3 000a > $ print $'\xe4' | od -x > 0000000 0ae4 > The same happens under bash, so it might be a matter of some system setting > and/or my lack of understanding. Any clarification is appreciated! _______________________________________________ ast-developers mailing list [email protected] https://mailman.research.att.com/mailman/listinfo/ast-developers
