Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior
Hi, At Wed, 2 Oct 2002 23:21:12 +0700, Theppitak Karoonboonyanan wrote: This may look awkward for the definition of 0x08 to move back inconsistently. But the situation can still be defined more gracefully if we allow the cursor to stop at each combining character, and moving left through a combined cell means moving through the combining characters one by one to the base character before advancing to previous cell. This implementation has been adopted by some locally-patched terminal emulator, such as xiterm+thai (available in debian sid). This idea is inconsistent with already existing softwares, where cursor moves one column (half character) even when it moves across doublewidth characters. There are also existing softwares which treats combining characters in this way (against your idea). I think your idea can be implemented to be enabled only when some option is specified. However, in future, I think one internationalized software should work well for all people in the world. To achieve this, terminal's behavior must be defined consistently. At least, definition of 0x08 must not be modified. In this case, new control codes would be added for character-element-based movement of your idea. --- Tomohiro KUBOTA [EMAIL PROTECTED] http://www.debian.or.jp/~kubota/ Introduction to I18N http://www.debian.org/doc/manuals/intro-i18n/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior
On Thu, Oct 03, 2002 at 06:11:47PM +0900, Tomohiro KUBOTA wrote: At Wed, 2 Oct 2002 23:21:12 +0700, Theppitak Karoonboonyanan wrote: This may look awkward for the definition of 0x08 to move back inconsistently. But the situation can still be defined more gracefully if we allow the cursor to stop at each combining character, and moving left through a combined cell means moving through the combining characters one by one to the base character before advancing to previous cell. This implementation has been adopted by some locally-patched terminal emulator, such as xiterm+thai (available in debian sid). This idea is inconsistent with already existing softwares, where cursor moves one column (half character) even when it moves across doublewidth characters. There are also existing softwares which treats combining characters in this way (against your idea). I think your idea can be implemented to be enabled only when some option is specified. However, in future, I think one internationalized software should work well for all people in the world. To achieve this, terminal's behavior must be defined consistently. At least, definition of 0x08 must not be modified. In this case, new control codes would be added for character-element-based movement of your idea. Yes, you're right. These new control codes will not compromise any requirement as in what I proposed. Another possibility is that bash or any other shells be modified so that they don't emit 0x08 in case of backspace upon combined cells. Hmm.. This may be more sound. -Thep. -- Theppitak Karoonboonyanan Thai Linux Working Group (TLWG) http://linux.thai.net/thep/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior
Hi, At Thu, 3 Oct 2002 16:53:02 +0700, Theppitak Karoonboonyanan wrote: Another possibility is that bash or any other shells be modified so that they don't emit 0x08 in case of backspace upon combined cells. Hmm.. This may be more sound. In *any* cases, bash (or any shells) don't emit 0x08 alone in case of BACKSPACE key push. For example, after normal (singlewidth) character, 0x08 0x20 0x08 is emitted. Thus, you don't need to regard deviation from BACKSPACE key equal 0x08 as something bad. To erase last combining character element, i.e., to implement your favorite behavior, the shell must emit 0x08 base character code. If the previous combined character consists of one base character and multiple combining characters, combining character codes (without the last combining character) must be emitted additionally. The shell must erase last one character from its internal buffer. Of course it may one byte (for example, TIS-620) or more (for example, UTF-8). On the contrary, for BACKSPACE key to erase whole combined character, the shell must emit 0x08 0x20 0x08 and erase whole combined character from the internal buffer. I think both behaviors can be possible technically. I don't have any opinion on which behavior should be implemented (or should be default if both will be implemented). I think it is a good idea to consult people who wrote bash-2.05b i18n patch about Thai people's expectation of BACKSPACE key behavior. --- Tomohiro KUBOTA [EMAIL PROTECTED] http://www.debian.or.jp/~kubota/ Introduction to I18N http://www.debian.org/doc/manuals/intro-i18n/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior
I had originally argued strongly in favour of a BACKSPACE display semantics that removes the character left of the cursor (let's call this character L), and then moves the cursor wcwidth(L) character cells to the left. This is by far the most sensible solution, because this way, if you echo the keyboard output back into the display, pressing backspace will give you exactly the same effect as you would expect in an editor. The result would have been that in order for backspace to work correctly with double-width (and combining) characters, no changes will have to be made to the tty cooked mode editor in the kernel that you get when you type text into stdin of any Unix application. Unfortunately, existing CJK implementation practice has messed up this and has used backspace with a move-cursor-left-one-cell display semantics. An argument that we have to stick in UTF-8 modes compatible with this highly unfortunate and inconvenient CJK implementation practice has been made, but I am still not convinced that a) there really is such a backwards compatibility requirement b) that the 1-cell-left semantics of backspace has any advantage over the erase-1-character-left semantics whatsoever I would say at least that the jury of what a backspace sent to a UTF-8 terminal means is still out, and I'd advise authors of editors not to send any backspace 0x08 characters to terminals. Please use absolute or relative cursor positioning command sequences, which have unambiguous semantics. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior
On Sun, Sep 29, 2002 at 10:50:05PM +0700, Theppitak Karoonboonyanan wrote: On Sat, Sep 28, 2002 at 09:49:43AM +0900, Tomohiro KUBOTA wrote: I think this choice is shell's responsibility, not terminal. This is because now single BACKSPACE means keyboard typing, not submitting of 0x08 to tty. Submitting of 0x08 to tty should *always* move cursor left one column, regardless of what character is written on the left column. This is because terminals cannot tell the context of accepted 0x08. Many softwares uses 0x08 to move one column left. For example, even if the left column is doublewidth character, 0x08 moves *one* column and the cursor will be located at the right half of the doublewidth character. Bash-2.05b is aware of this behavior and, when BACKSPACE key is pressed after doublewidth character, bash issues 0x08 0x08 0x20 0x20 0x08 0x08 to the tty to erase the whole doublewidth character. (It is more complex in real, to handle line folding.) Seems so clear to me now. Thanks for the explanation. So, let's move to bash (and whatever other shells). :) And the discussion should be moved to li18nux instead of xfree86-i18n. So, I'll stop following up this thread in xfree86-i18n after this one. Umm.. After digging some codes, I've changed my mind. Is it still possible for bash to tell xterm to remove just the last combining character of previous cell, while the cursor stands still? In that case, CursorBack() in cursor.c needs to retrieve screen buffer (using SCRN_BUF_COM1L() and so on) to determine whether to move back. This may look awkward for the definition of 0x08 to move back inconsistently. But the situation can still be defined more gracefully if we allow the cursor to stop at each combining character, and moving left through a combined cell means moving through the combining characters one by one to the base character before advancing to previous cell. This implementation has been adopted by some locally-patched terminal emulator, such as xiterm+thai (available in debian sid). Is it possible for xterm? -Thep. -- Theppitak Karoonboonyanan Thai Linux Working Group (TLWG) http://linux.thai.net/thep/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior
On Sat, Sep 28, 2002 at 09:49:43AM +0900, Tomohiro KUBOTA wrote: I think this choice is shell's responsibility, not terminal. This is because now single BACKSPACE means keyboard typing, not submitting of 0x08 to tty. Submitting of 0x08 to tty should *always* move cursor left one column, regardless of what character is written on the left column. This is because terminals cannot tell the context of accepted 0x08. Many softwares uses 0x08 to move one column left. For example, even if the left column is doublewidth character, 0x08 moves *one* column and the cursor will be located at the right half of the doublewidth character. Bash-2.05b is aware of this behavior and, when BACKSPACE key is pressed after doublewidth character, bash issues 0x08 0x08 0x20 0x20 0x08 0x08 to the tty to erase the whole doublewidth character. (It is more complex in real, to handle line folding.) Seems so clear to me now. Thanks for the explanation. So, let's move to bash (and whatever other shells). :) And the discussion should be moved to li18nux instead of xfree86-i18n. So, I'll stop following up this thread in xfree86-i18n after this one. -Thep. -- Theppitak Karoonboonyanan Thai Linux Working Group (TLWG) http://linux.thai.net/thep/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n]Re: [li18nux:1094] BACKSPACE behavior
I +1 to Theppitak's proposal to fix BACKSPACE behavior for Thai text. (from: erases the whole combined cell in single BACKSPACE to : erases only last character typed in single BACKSPACE) Art Theppitak Karoonboonyanan wrote: I think li18nux is related. So, let me cross-post. On Wed, Sep 18, 2002 at 01:48:15PM +0900, Tomohiro KUBOTA wrote: At Wed, 18 Sep 2002 11:10:15 +0700, Theppitak Karoonboonyanan wrote: I've been happily using Thai on XTerm with UTF-8 support. The only problem is that characters with length of more than one byte (in UTF-8) aren't deleted completely with a single backspace. Instead, only the last byte is removed, while the display shows that the total character, or even the total cell in case of multi-char cells, are removed. This results in inconsistency between what is shown on screen and what is stored in the buffer. The only last byte is stored in the buffer in your shell, not in XTerm. Thus, XTerm is not responsible for this problem. If you are using bash, please try version 2.05b . This problem is solved. If you are using tcsh, this problem is solved only for east Asian doublewidth characters but not for Thai. zsh seems to have no support for multibyte characters nor combinig/doublewidth characters. $ dpkg -l bash Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) ||/ Name VersionDescription +++-==-==- ii bash 2.05b-3The GNU Bourne Again SHell Hmm.. Looks like it needs some additional configuration, because it works properly on my friend's machine, but not on the one I'm using (both are debian sid). PS. The current CVS version of XTerm can handle combining characters of Thai in TIS-620 encoding, by using CVS version of luit. If you have to run softwares with such problems, this may partly help you. Thank you. I will try it soon. Then, here's the part I think li18nux is also related (sorry if this has already been discussed): With bash 2.05b (on the box that works for Thai), I find BACKSPACE erases the whole combined cell, e.g. KO KAI + SARA II + MAI EK are all erased with a single BACKSPACE stroke. I know this is what described in Unicode implementation guide. But it's not what Thai people expect. The common practice before Unicode (e.g. MS Windows, Solaris, and Thai locally developed applications, which all follow WTT 2.0 recommendation) is that BACKSPACE will undo the last keystroke, that is, just remove the last combining character typed, not the whole cell. On the other hand, pressing DELETE will remove the whole cell after the cursor. So, with Unicode guideline, only half of the requirement is met (for DELETE, but not for BACKSPACE). And, according to a thread in gtk-i18n-list a year ago, some other languages also have the similar requirement. The thread beginning: http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00014.html Responses for Korean, Vietnamese, Indic: http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00020.html With exception in Vietnamese telex mode: http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00032.html Responses for Arabic: http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00024.html http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00037.html And Tamil: http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00060.html For Thai, a guy has created an illustration to describe the requirement: http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00066.html A stateful solution proposed: http://mail.gnome.org/archives/gtk-i18n-list/2001-May/msg00022.html Should we discuss how to cope with it? Regards, -Thep. ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior
Hi, At Fri, 20 Sep 2002 14:24:06 +0700, Art - Arthit Suriyawongkul wrote: I +1 to Theppitak's proposal to fix BACKSPACE behavior for Thai text. (from: erases the whole combined cell in single BACKSPACE to : erases only last character typed in single BACKSPACE) I think this choice is shell's responsibility, not terminal. This is because now single BACKSPACE means keyboard typing, not submitting of 0x08 to tty. Submitting of 0x08 to tty should *always* move cursor left one column, regardless of what character is written on the left column. This is because terminals cannot tell the context of accepted 0x08. Many softwares uses 0x08 to move one column left. For example, even if the left column is doublewidth character, 0x08 moves *one* column and the cursor will be located at the right half of the doublewidth character. Bash-2.05b is aware of this behavior and, when BACKSPACE key is pressed after doublewidth character, bash issues 0x08 0x08 0x20 0x20 0x08 0x08 to the tty to erase the whole doublewidth character. (It is more complex in real, to handle line folding.) --- Tomohiro KUBOTA [EMAIL PROTECTED] http://www.debian.or.jp/~kubota/ Introduction to I18N http://www.debian.org/doc/manuals/intro-i18n/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n