Hello,

I've noticed something unexpected when copy-pasting UTF-8 characters in
xterm: xterm seems to change some of the characters into something
different but visually similar.  Here's an example (using ksh):

$ uname -a
OpenBSD foo.my.domain 6.1 GENERIC#19 i386
$ ls
Thérèse
$ ls | od -c
0000000    T   h   e 314 201   r   e 314 200   s   e  \n                
0000014
$ cp Thérèse Thérèse

This copy command is typed as follows: type 'cp ', press tab for ksh to
auto-complete the first filename, another space, then use the mouse to
copy-paste the first filename into xterm to get the second filename.
The cp command works without any error.  The result is:

$ ls
Thérèse Thérèse
$ ls | od -c
0000000    T   h   e 314 201   r   e 314 200   s   e  \n   T   h 303 251
0000020    r 303 250   s   e  \n                                        
0000026

Note how the two filenames look exactly the same but are actually different
byte sequences...  So it looks like xterm is changing e 314 201 into 303 251
and e 314 200 into 303 250 when copy-pasting... which was rather a surprise
to me.  I'm pretty sure the problem is with xterm, not with ksh, because
the same thing happens with bash (using a similar xterm and using bash
through ssh to a Linux machine).

Is this normal / expected?

For info:

$ cat .Xdefaults
xterm*background:       black
xterm*foreground:       white
xterm*metaSendsEscape:  true
xterm*multiScroll:      true
xterm*saveLines:        256
xterm*scrollBar:        true
xterm*scrollKey:        true
xterm*scrollTtyOutput:  false
xterm*utf8Title:        true
xterm*utmpInhibit:      true
xterm*visualBell:       true
$ set | egrep -i utf
LC_CTYPE=en_US.UTF-8
XTERM_LOCALE=en_US.UTF-8

Thanks,

Philippe



Reply via email to