XTerm locale-sensitivity again [was: Emacs and...]

Juliusz Chroboczek Sun, 01 Jul 2001 09:54:17 -0700
TK> When you announced development of "luit", nobody (other than me)
TK> asked why you develop it even though we already have
TK> locale-sensibility patch.  I thought that this implies that
TK> members of [EMAIL PROTECTED] and this list thought "luit"-approach
TK> is better.

I'm not sure about that.  I think many people agree that it is good to
explore various ways of proceeding before committing to a single one.
People are not expressing an opinion before they get the chance to
play with luit and see whether it is good enough.

Nobody has ever said that if luit is not satisfactory, other
approaches should not be explored.  On the other hand, I, for one, am
of the mind that a separate client is a more elegant and easier to
maintain solution that hacking at input.c.

TK> Why not discuss now?

Because I'm not ready to discuss it before I finish luit.

Because we've got five months until the next XFree86 release, and this
is the right time to explore and be fanciful rather than committing to
a single solution.

TK> I will not agree if "condom" means "user has to invoke it [...]"

1. Nobody has said that XTerm should not invoke luit in multibyte
   locales if luit is stable enough.  (Safe sex should be the default.)

2. The only people who can to ``agree'' or ``disagree'' are Thomas
   Dickey and the XFree86 Core Team.  We mere mortals can merely argue.
   (Our time may be better spent hacking, though.)

TK> I feel "luit" is a seed for another flamewar.

(Or, in other words, unless you agree with me I'm going to cry?)

TK> I guess you mean you might be bothered by many encodings such as
TK> EUC-TW, Shift_JIS, GB18030, and so on.

Roughly speaking, yes.

The current version of luit is designed to be trivial to extend with
new encodings.  On the other hand, including a new encoding
*structure* requires writing handcrafted code, which carefully deals
with resynchronisation.

This approach is correct if we only need a few more irregular charset
structures (currently, only 96+128 and Big 5 are implemented).  On the
other hand, if many more irregular charsets are to come, then I'll
have to think about factoring this code out; and it is not quite clear
to me how to do that with perfect resynchronisation behaviour.

TK> Then how about using iconv() for "luit"?  Portability problem?

There are two reasons.  First, iconv is not designed for live streams,
but for converting static strings; thus, it does not deal with
resynchronisation well.  This is not simply an implementation issue --
iconv does not provide the necessary interfaces to deal with
resynchronisation.  (Or, more exactly, it does not provide all the
necessary interfaces.)

I don't blame the designers of iconv for that: designing sufficiently
powerful interfaces is tricky.  This is what I meant with ``factoring''
above.

The second reason is that I personally dislike iconv, and feel that
such a beast has no place in libc.  Feel free to use iconv in your
code, but I'm not going to use it in mine.

TK> I don't understand why the current implementation of "luit" can
TK> avoid [the resynchronisation] problem while iconv() approach
TK> cannot.

Because luit contains carefully hand-crafted resynchronisation code.
While I have not proved it, I believe that the current implementation
of resynchronisation for Big 5 is optimal within the constraints of
one byte of memory and no lookahead.  See iso2022.c after line 671.

The same approach can be used for Shift JIS.  It cannot be used for
UTF-16.  (EUC-TW has few resynchronisation issues.)

(As to EUC-TW, I'm not going to implement this because (1) nobody
seems to use it and (2) XFree86 lacks the necessary fontenc tables.
If somebody really needs it, they can contribute the needed tables and
I'll implement it.)

Regards,

                                        Juliusz
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
XTerm locale-sensitivity again [was: Emacs and...]

Reply via email to