Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior

2002-10-03 Thread Tomohiro KUBOTA

Hi,

At Wed, 2 Oct 2002 23:21:12 +0700,
Theppitak Karoonboonyanan wrote:

 This may look awkward for the definition of 0x08 to move back
 inconsistently. But the situation can still be defined more gracefully
 if we allow the cursor to stop at each combining character, and moving
 left through a combined cell means moving through the combining
 characters one by one to the base character before advancing to previous
 cell. This implementation has been adopted by some locally-patched
 terminal emulator, such as xiterm+thai (available in debian sid).

This idea is inconsistent with already existing softwares, where
cursor moves one column (half character) even when it moves across
doublewidth characters.  There are also existing softwares which
treats combining characters in this way (against your idea).

I think your idea can be implemented to be enabled only when some
option is specified.  However, in future, I think one internationalized
software should work well for all people in the world.  To achieve
this, terminal's behavior must be defined consistently.

At least, definition of 0x08 must not be modified.  In this case,
new control codes would be added for character-element-based
movement of your idea.


---
Tomohiro KUBOTA [EMAIL PROTECTED]
http://www.debian.or.jp/~kubota/
Introduction to I18N  http://www.debian.org/doc/manuals/intro-i18n/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior

2002-10-03 Thread Theppitak Karoonboonyanan

On Thu, Oct 03, 2002 at 06:11:47PM +0900, Tomohiro KUBOTA wrote:
 
 At Wed, 2 Oct 2002 23:21:12 +0700,
 Theppitak Karoonboonyanan wrote:
 
  This may look awkward for the definition of 0x08 to move back
  inconsistently. But the situation can still be defined more gracefully
  if we allow the cursor to stop at each combining character, and moving
  left through a combined cell means moving through the combining
  characters one by one to the base character before advancing to previous
  cell. This implementation has been adopted by some locally-patched
  terminal emulator, such as xiterm+thai (available in debian sid).
 
 This idea is inconsistent with already existing softwares, where
 cursor moves one column (half character) even when it moves across
 doublewidth characters.  There are also existing softwares which
 treats combining characters in this way (against your idea).
 
 I think your idea can be implemented to be enabled only when some
 option is specified.  However, in future, I think one internationalized
 software should work well for all people in the world.  To achieve
 this, terminal's behavior must be defined consistently.
 
 At least, definition of 0x08 must not be modified.  In this case,
 new control codes would be added for character-element-based
 movement of your idea.

Yes, you're right. These new control codes will not compromise any
requirement as in what I proposed.

Another possibility is that bash or any other shells be modified so that
they don't emit 0x08 in case of backspace upon combined cells.
Hmm.. This may be more sound.

-Thep.
-- 
Theppitak Karoonboonyanan
Thai Linux Working Group (TLWG)
http://linux.thai.net/thep/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior

2002-10-03 Thread Tomohiro KUBOTA

Hi,

At Thu, 3 Oct 2002 16:53:02 +0700,
Theppitak Karoonboonyanan wrote:

 Another possibility is that bash or any other shells be modified so that
 they don't emit 0x08 in case of backspace upon combined cells.
 Hmm.. This may be more sound.

In *any* cases, bash (or any shells) don't emit 0x08 alone in case of
BACKSPACE key push.  For example, after normal (singlewidth) character,
0x08 0x20 0x08 is emitted.  Thus, you don't need to regard deviation
from BACKSPACE key equal 0x08 as something bad.


To erase last combining character element, i.e., to implement your
favorite behavior, the shell must emit 0x08 base character code.
If the previous combined character consists of one base character and
multiple combining characters, combining character codes (without the
last combining character) must be emitted additionally.  The shell
must erase last one character from its internal buffer.  Of course
it may one byte (for example, TIS-620) or more (for example, UTF-8).

On the contrary, for BACKSPACE key to erase whole combined character,
the shell must emit 0x08 0x20 0x08 and erase whole combined character
from the internal buffer.


I think both behaviors can be possible technically.  I don't have
any opinion on which behavior should be implemented (or should be
default if both will be implemented).  I think it is a good idea
to consult people who wrote bash-2.05b i18n patch about Thai people's
expectation of BACKSPACE key behavior.

---
Tomohiro KUBOTA [EMAIL PROTECTED]
http://www.debian.or.jp/~kubota/
Introduction to I18N  http://www.debian.org/doc/manuals/intro-i18n/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior

2002-10-03 Thread Markus Kuhn

I had originally argued strongly in favour of a BACKSPACE display
semantics that removes the character left of the cursor (let's call this
character L), and then moves the cursor wcwidth(L) character cells
to the left. This is by far the most sensible solution, because this
way, if you echo the keyboard output back into the display, pressing
backspace will give you exactly the same effect as you would expect
in an editor. The result would have been that in order for backspace
to work correctly with double-width (and combining) characters,
no changes will have to be made to the tty cooked mode editor in
the kernel that you get when you type text into stdin of any
Unix application.

Unfortunately, existing CJK implementation practice has messed up
this and has used backspace with a move-cursor-left-one-cell display
semantics. An argument that we have to stick in UTF-8 modes compatible
with this highly unfortunate and inconvenient CJK implementation
practice has been made, but I am still not convinced that

  a) there really is such a backwards compatibility requirement
  b) that the 1-cell-left semantics of backspace has any advantage
 over the erase-1-character-left semantics whatsoever

I would say at least that the jury of what a backspace sent to a
UTF-8 terminal means is still out, and I'd advise authors of editors
not to send any backspace 0x08 characters to terminals. Please use
absolute or relative cursor positioning command sequences, which have
unambiguous semantics.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior

2002-10-02 Thread Theppitak Karoonboonyanan

On Sun, Sep 29, 2002 at 10:50:05PM +0700, Theppitak Karoonboonyanan wrote:
 On Sat, Sep 28, 2002 at 09:49:43AM +0900, Tomohiro KUBOTA wrote:
  I think this choice is shell's responsibility, not terminal.
  This is because now single BACKSPACE means keyboard typing,
  not submitting of 0x08 to tty.
  
  Submitting of 0x08 to tty should *always* move cursor left
  one column, regardless of what character is written on the
  left column.  This is because terminals cannot tell the
  context of accepted 0x08.  Many softwares uses 0x08 to
  move one column left.
  
  For example, even if the left column is doublewidth character,
  0x08 moves *one* column and the cursor will be located at the
  right half of the doublewidth character.  Bash-2.05b is aware
  of this behavior and, when BACKSPACE key is pressed after
  doublewidth character, bash issues 0x08 0x08 0x20 0x20 0x08 0x08 
  to the tty to erase the whole doublewidth character.  (It is
  more complex in real, to handle line folding.)
 
 Seems so clear to me now. Thanks for the explanation. So, let's
 move to bash (and whatever other shells). :)
 
 And the discussion should be moved to li18nux instead of xfree86-i18n.
 So, I'll stop following up this thread in xfree86-i18n after this one.

Umm.. After digging some codes, I've changed my mind.

Is it still possible for bash to tell xterm to remove just the last
combining character of previous cell, while the cursor stands still?

In that case, CursorBack() in cursor.c needs to retrieve screen buffer
(using SCRN_BUF_COM1L() and so on) to determine whether to move back.

This may look awkward for the definition of 0x08 to move back
inconsistently. But the situation can still be defined more gracefully
if we allow the cursor to stop at each combining character, and moving
left through a combined cell means moving through the combining
characters one by one to the base character before advancing to previous
cell. This implementation has been adopted by some locally-patched
terminal emulator, such as xiterm+thai (available in debian sid).

Is it possible for xterm?

-Thep.
-- 
Theppitak Karoonboonyanan
Thai Linux Working Group (TLWG)
http://linux.thai.net/thep/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior

2002-09-29 Thread Theppitak Karoonboonyanan

On Sat, Sep 28, 2002 at 09:49:43AM +0900, Tomohiro KUBOTA wrote:
 I think this choice is shell's responsibility, not terminal.
 This is because now single BACKSPACE means keyboard typing,
 not submitting of 0x08 to tty.
 
 Submitting of 0x08 to tty should *always* move cursor left
 one column, regardless of what character is written on the
 left column.  This is because terminals cannot tell the
 context of accepted 0x08.  Many softwares uses 0x08 to
 move one column left.
 
 For example, even if the left column is doublewidth character,
 0x08 moves *one* column and the cursor will be located at the
 right half of the doublewidth character.  Bash-2.05b is aware
 of this behavior and, when BACKSPACE key is pressed after
 doublewidth character, bash issues 0x08 0x08 0x20 0x20 0x08 0x08 
 to the tty to erase the whole doublewidth character.  (It is
 more complex in real, to handle line folding.)

Seems so clear to me now. Thanks for the explanation. So, let's
move to bash (and whatever other shells). :)

And the discussion should be moved to li18nux instead of xfree86-i18n.
So, I'll stop following up this thread in xfree86-i18n after this one.

-Thep.
-- 
Theppitak Karoonboonyanan
Thai Linux Working Group (TLWG)
http://linux.thai.net/thep/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



[I18n]Re: [li18nux:1094] BACKSPACE behavior

2002-09-27 Thread Art - Arthit Suriyawongkul

I +1 to Theppitak's proposal to fix BACKSPACE behavior for Thai text.

(from: erases the whole combined cell in single BACKSPACE
  to  : erases only last character typed in single BACKSPACE)

Art


Theppitak Karoonboonyanan wrote:
 I think li18nux is related. So, let me cross-post.
 
 On Wed, Sep 18, 2002 at 01:48:15PM +0900, Tomohiro KUBOTA wrote:
 
At Wed, 18 Sep 2002 11:10:15 +0700,
Theppitak Karoonboonyanan wrote:


I've been happily using Thai on XTerm with UTF-8 support. The only
problem is that characters with length of more than one byte (in UTF-8)
aren't deleted completely with a single backspace. Instead, only the last
byte is removed, while the display shows that the total character, or
even the total cell in case of multi-char cells, are removed. This
results in inconsistency between what is shown on screen and what is
stored in the buffer.

The only last byte is stored in the buffer in your shell, not in
XTerm.  Thus, XTerm is not responsible for this problem.

If you are using bash, please try version 2.05b .  This problem is
solved.  If you are using tcsh, this problem is solved only for
east Asian doublewidth characters but not for Thai.  zsh seems to
have no support for multibyte characters nor combinig/doublewidth
characters.
 
 
 $ dpkg -l bash
 Desired=Unknown/Install/Remove/Purge/Hold
 | Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed
 |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
 ||/ Name   VersionDescription
 +++-==-==-
 ii  bash   2.05b-3The GNU Bourne Again SHell
 
 Hmm.. Looks like it needs some additional configuration, because it
 works properly on my friend's machine, but not on the one I'm using
 (both are debian sid).
 
 
PS. The current CVS version of XTerm can handle combining characters
of Thai in TIS-620 encoding, by using CVS version of luit.  If you
have to run softwares with such problems, this may partly help you.
 
 
 Thank you. I will try it soon.
 
 
 Then, here's the part I think li18nux is also related (sorry if this
 has already been discussed):
 
 With bash 2.05b (on the box that works for Thai), I find BACKSPACE
 erases the whole combined cell, e.g. KO KAI + SARA II + MAI EK
 are all erased with a single BACKSPACE stroke.
 
 I know this is what described in Unicode implementation guide. But it's
 not what Thai people expect.
 
 The common practice before Unicode (e.g. MS Windows, Solaris, and Thai
 locally developed applications, which all follow WTT 2.0 recommendation)
 is that BACKSPACE will undo the last keystroke, that is, just remove the
 last combining character typed, not the whole cell. On the other hand,
 pressing DELETE will remove the whole cell after the cursor.
 
 So, with Unicode guideline, only half of the requirement is met (for
 DELETE, but not for BACKSPACE).
 
 And, according to a thread in gtk-i18n-list a year ago, some other
 languages also have the similar requirement.
 
 The thread beginning:
   http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00014.html
 Responses for Korean, Vietnamese, Indic:
   http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00020.html
 With exception in Vietnamese telex mode:
   http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00032.html
 Responses for Arabic:
   http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00024.html
   http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00037.html
 And Tamil:
   http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00060.html
 For Thai, a guy has created an illustration to describe the requirement:
   http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00066.html
 A stateful solution proposed:
   http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00022.html
 
 Should we discuss how to cope with it?
 
 Regards,
 -Thep.


___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior

2002-09-27 Thread Tomohiro KUBOTA

Hi,

At Fri, 20 Sep 2002 14:24:06 +0700,
Art - Arthit Suriyawongkul wrote:

 I +1 to Theppitak's proposal to fix BACKSPACE behavior for Thai text.
 
 (from: erases the whole combined cell in single BACKSPACE
   to  : erases only last character typed in single BACKSPACE)

I think this choice is shell's responsibility, not terminal.
This is because now single BACKSPACE means keyboard typing,
not submitting of 0x08 to tty.

Submitting of 0x08 to tty should *always* move cursor left
one column, regardless of what character is written on the
left column.  This is because terminals cannot tell the
context of accepted 0x08.  Many softwares uses 0x08 to
move one column left.

For example, even if the left column is doublewidth character,
0x08 moves *one* column and the cursor will be located at the
right half of the doublewidth character.  Bash-2.05b is aware
of this behavior and, when BACKSPACE key is pressed after
doublewidth character, bash issues 0x08 0x08 0x20 0x20 0x08 0x08 
to the tty to erase the whole doublewidth character.  (It is
more complex in real, to handle line folding.)

---
Tomohiro KUBOTA [EMAIL PROTECTED]
http://www.debian.or.jp/~kubota/
Introduction to I18N  http://www.debian.org/doc/manuals/intro-i18n/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n