Re: current idea

George W Gerrity Sun, 04 Nov 2001 14:05:08 -0800

At 22:34 +0100 2001-11-03, Werner LEMBERG wrote:
>
>>  Ai.ai, ai! Kludge upon kludge! Surely it really IS a better idea to
>>  start back at the beginning and rewrite the underlying Lisp engine
>>  to handle UTF-8, and do it right.
>
>Maybe a misunderstanding.  The underlying Lisp engine *never* sees
>UTF-8.  We are talking about buffer and string representations.


I didn't keep all the copies of past correspondence, but I understood 
one of a week or so ago to say that part of the reason for not doing 
an entire rewrite was because the lisp engine didn't talk UTF-8, and, 
moreover, couldn't easily be made to.

>  > To pick up on part of this conversation, UNLESS you use a
>>  fixed-length internal code for ALL Unicode characters (and I suspect
>>  the problem is with the underlying Lisp that makes it expensive to
>>  do the obvious and use UCS-32),
>
>A 22bit integer is used for that purpose.

That comment in another letter re-enforced my belief that the lisp 
engine was the trouble. I ASSUMED that the lisp atom was a 32-bit 
word, and that the missing ten bits were taken up with tags, etc. The 
point is, however, that maybe 22 bits is OK for this round, but what 
do you do in a year or two when the higher planes get more populated, 
and someone wants to use emacs for some quick and dirty editing of a 
scholarly work on cuneiform, say?

>  > If you clean emacs up so that UTF-8 is its native character set,
>>  then you have only ONE, CLEAN interface to design around.  It should
>>  handle ASCII (and ISO Latin-1?) transparently, as it is a clean
>>  subset.  That in itself should keep 90% of users quiet and
>>  satisfied.
>
>Again, what we are talking about here is nothing the casual user will
>ever see.

That is largely, but not completely true, as near as I can tell from 
previous correspondence. What prompted my wail was the contortions 
you appear to be going through to arrive at this transparency, when 
it is anything but underneath.

I am much more concerned that having done all, the result will be 
totally (as opposed to nearly) unmaintainable. I was trying to say 
that a clean rewrite is probably overdue anyway, and in the end, it 
might just be quicker, especially since you can then separate most of 
the translation to the I/O interface during saves and restores, 
rather than doing it inside the bowels of the editor on the fly.

Moreover, if internal code is canonical UTF-8, and not some 
historical (essentially) single-byte-oriented code, you will have the 
1:n translator problem rather than what is essentially an n^2 
translator problem, the latter arising because the internal code is 
such a bad fit for everything.

A final point is that by hiving off the translators to I/O, you can 
delay writing many of them, since there are already some pretty 
reasonable UTF-8 <-> Encoding-X translators available that can 
probably be adapted as UN*X pipes.

George
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: current idea

Reply via email to