you had set LANG to *UTF-8*
and that is a multibyte encoding
$'\xe4' is not a valid utf-8 encoding
the example I gave contained the utf-8 for s-tzet $'\303\237'
the utf-8 for a-umlaut is $'\303\244'

od -tx1 shows that this message contains $'\xe4' but it doesn't
render as a-umlaut -- does it render as a-umlaut with LANG=C for you?
if so then there must be some window/terminal settings in play
that are independent of $LANG / $LC_*

I'm pretty sure both the env and term settings must cooperate in
order to get consistent rendering

i.e., both the program and the terminal must know that utf-8 is enabled

On Sun, 28 Oct 2007 11:03:57 +0100 Bernd Eggink wrote:
> Glenn Fowler schrieb:
> > some of your characters did not translate properly for email

> Here is a simpler example:

> $ a=( name='ä' )      # lower case a umlaut (unicode=00E4)
> $ print $a
> ( name=ä )    # correct

> $ print "$a"
> (
>       name==Ã
> )

> Instead of the lower case a umlaut, an upper case A with tilde (unicode 
> 00C3) appears, and the equal is doubled. This happens on an X terminal 
> (urxvt) as well as on a normal Linux tty.

> However, with the assignment

> $ a=( name=$'\xe4' )

> both outputs look fine. I noticed that entering 'ä' (a umlaut) and $'\xe4' 
> produces the same character on the screen, but different byte sequences:

> $ print 'ä' | od -x

> 0000000 a4c3 000a

> $ print $'\xe4' | od -x

> 0000000 0ae4

> The same happens under bash, so it might be a matter of some system setting 
> and/or my lack of understanding. Any clarification is appreciated!

_______________________________________________
ast-developers mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to