On Mon, 6 Aug 2001 [EMAIL PROTECTED] wrote:
> While using Backspace or Delete to erase character in Xterm with
> UTF-8 support , it will not work properly. It will accept to more
> Backspace for a single character.
>
> Is there any solution available.

Please describe your problem in detail. Best send a Perl command
line that demonstates the problem.

Backspace certainly works nicely for singlewidth characters:

$ perl -e 'use utf8; print "A\x{20ac}\bC\n";'
AC

The 3-byte EURO SIGN gets removed with one single backspace.

There is some controversy about what should happen when a backspace is
issued when the cursor is to the right of a double-width characters
such as U+30AC (KATAKANA LETTER GA). What happens at the moment is:

$ perl -e 'use utf8; print "A\x{30ac}\bC\n";'
AXC

where X is the remaining left half of U+30AC.

I personally would prefer if backspace erased the entire character left of
the cursor and if it was a base character moves onto its position, because
this (together with some other fine tuning of the exact cursor semantics)
would make it possible to implement primitive line editors like the kernel
tty "cooked mode" without knowing about the distinction between single or
double-space characters and combining characters. In the context of
combining characters, this means that backspace first strips one combining
character at a time off the character to the left of the cursor before
removing the base character and moving onto that cell. In the context of
wide characters, this would mean that a single backspace would remove for
example one Kanji.

Various Japanese users told us however that they strongly prefer that
Backspace moves always one character cell, not one character. The rational
given was only that this is how existing EUC terminal emulators such as
kterm have worked for years and that they got used to that. I'm not
entirely convinced yet, but didn't make a big fuzz about it so far,
because we need more some stability, otherwise the developers of proper
editors will become annoyed.

What I suggest to developers of full-screen editors and similar software
is to use only numeric cursor positioning commands, as their semantics is
undisputed to be related to character cell coordinates.

I think we clearly need one terminal emulator control function that
erases the character left of the cursor. This could be

  - BS
  - DEL
  - a new control sequence

The current situation is that the committee that wrote ECMA-48 does not
exist any more and we here are probably the ones most aware of the need
for a next generation terminal semantics standard that resolves these
issues surrounding ECMA-48 and UCS.

One of the requirements that I suggest for such a next generation terminal
standard is that it is possible to write a readline like simple editor
without access to wcwidth. I think, that is doable with care.
Readline-line editors become hardcoded for example in terminal drivers,
where they are not easily updated when new Unicode characters get added.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to