Re: XTerm locale-sensitivity again [was: Emacs and...]

Tomohiro KUBOTA Mon, 02 Jul 2001 01:14:37 -0700
Hi,

At 01 Jul 2001 19:17:32 +0200,
Juliusz Chroboczek <[EMAIL PROTECTED]> wrote:

> I'm not sure about that.  I think many people agree that it is good to
> explore various ways of proceeding before committing to a single one.
> People are not expressing an opinion before they get the chance to
> play with luit and see whether it is good enough.

I see.


> Nobody has ever said that if luit is not satisfactory, other
> approaches should not be explored.  On the other hand, I, for one, am
> of the mind that a separate client is a more elegant and easier to
> maintain solution that hacking at input.c.

I think your focus is to separate encoding conversion from core
part of XTerm to keep simplicity of the core part of XTerm.  Then,
how about utf8.[ch] approach?  It succeeds to separate encoding
conversion from core XTerm and, at the same time, it is _not_ a
separate client.  Do you like it?

However, unlike utf8.[ch] in recent xterm, I feel the current 
implementation of "luit" is too complex to be expected to be
integrated into XTerm source tree.  (Of course I understand I am
not the last person to determine this --- Do I have to write 
such a thing everytime I write my opinion?)


> 1. Nobody has said that XTerm should not invoke luit in multibyte
>    locales if luit is stable enough.  (Safe sex should be the default.)

I hope nobody will say such a thing! :-)
(I think that not only multibyte language speakers but also other
complex languages such as Thai and Hebrew, which conventional 8bit
mode of XTerm cannot support, will wannt to use luit as a default.)


> 2. The only people who can to ``agree'' or ``disagree'' are Thomas
>    Dickey and the XFree86 Core Team.  We mere mortals can merely argue.
>    (Our time may be better spent hacking, though.)

It's a figure of speech.  Don't you really understand what I want
to say?  What English word is appropriate here?  


> The same approach can be used for Shift JIS.  It cannot be used for
> UTF-16.  (EUC-TW has few resynchronisation issues.)

Yes.  The difference between Shift_JIS and Big5 is that 0xa0-0xdf
cannot be a leading byte of multibyte characters in Shift_JIS.
They are JIS X 0201 Kana and should be mapped into HALFWIDTH KANA
of UCS.  I think UTF-16 is nightmare because there are almost no 
chance for resynchronisation.  (Anyway, I don't think we need UTF-16
support.)

(In real usage, EUC-type encodings have more chance for
resynchronisation problem than Shift_JIS.  Imagine G1 characters
continues.  In Shift_JIS, the training byte sometimes goes to
GL region [usually more than once per 10 characters for daily
Japanese text] and this provides a chance for resynchronisation.)


I think iconv() can handle resynchronization in practical way,
though I have not proved.

1. Store the coming byte into the buffer.  If the buffer is full,
   throw away the first byte of the buffer.
2. If iconv(buffer) returns EINVAL, don't clear the buffer,
   don't output anything, and proceed to the next byte.
3. If iconv(buffer) returns EILSEQ, throw away the first byte of
   the buffer, don't output anything, and try iconv() again
   without waiting for the next byte.
4. If iconv(buffer) succeeds, output the character, clear the
   buffer, and proceed to the next byte.

IMO, you don't need to be very careful about the resynchronization.
We multibyte language speakers have long history to handle multibyte
encodings and we know that we don't have to be so careful about this
problem from our experience.  At worst, LF will cancel everything.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
Re: XTerm locale-sensitivity again [was: Emacs and...]

Reply via email to