Hello Alejandro,

ed works on both binary and ASCII text, which are all individual bytes.
Since ´ is an UTF-8 character, which comprises of the bytes C2 and B4,
ed thinks it should only delete a single byte which results in only C2.

Your terminal can't tell the meaning of just C2 which results, in this
particular case, in a question mark.

The reason the character disappears after the backspace is because the
presentation layer gets the instruction to clear the column prior to
the current position, so hence it appears deleted after the backspace.

Currently there's no UTF-8 support in our ed, and I don't see how this
can be done without endangering the binary editing capabilities.

martijn@

On 12/04/17 00:43, Alejandro G. Peregrina wrote:
> Hello,
> 
> I've noticed something unexpected when entering an accent character
> alone (´) and then deleting it in ed(1) in xterm(1). Instead of deleting
> it, it creates another character which is seen as an inverted
> exclamation (?) in the font 'misc-fixed'.
> 
>       How to reproduce:
> $ uname -a
> OpenBSD foo.my.domain 6.2 GENERIC.MP#1 amd64
> $ locale
> LANG=
> LC_COLLATE="C"
> LC_CTYPE=en_US.UTF-8
> LC_MONETARY="C"
> LC_NUMERIC="C"
> LC_TIME="C"
> LC_MESSAGES="C"
> LC_ALL=
> $ #Let's append the ´ character in ed(1)
> $ ed -p"> "
>> a
> ´
> 
>       Now let's delete with a backspace, return to create a newline and a dot
> to stop appending, and then print:
> 
> $ ed -p"> "
>> a
> 
> .
>> p
> (?)
> 
>       (The (?) is a simulation of the font character that misc-fixed shows to
> the terminal.)
> 
>       Whenever I use more(1) or less(1) to view it, it shows:
> 
> $ more test.txt
> <C2>
> 
> 
> 
> I have to add that I tested this with urxvt and ed(1) prints an Â
> character, but more(1) and less(1) keep printing <C2>.
> 
> When not using X this can't be reproduced. This is reproducible with
> xterm(1) and urxvt(1) in cwm(1) and fvwm(1). I've tested this in Linux
> and FreeBSD and this behaviour is not reproducible.
> 
> Thank you,
> A
> 

Reply via email to