Re: UTF8 Terminal Detection

George W Gerrity Tue, 13 Nov 2001 15:01:22 -0800

At 10:35 +0000 2001-11-13, Edmund GRIMLEY EVANS wrote:

><cut>
>
>George W Gerrity <[EMAIL PROTECTED]>:
>
>>  This correspondence thread (and others) has raised a question in my
>>  mind that someone may be able to answer. It appears that a LOT of
>>  problems could  be settled if a better locale environment standard
>  > for *UNIX* (and clones) were to be put in place. I have in mind the
>  >
>  > <cut>
>  >
>>  The question is -- is there anyone working on standardising this
>>  aspect so that it IS meaningful to query environment variables and
>>  expect to get the relevant information?
>
>This sounds a bit like moving from TERM to TERMCAP.


You are right, I AM confusing two things. It was a mistake to compare 
the way it is done in *UNIX* to Mac OS.

>So, what you're suggesting is that instead of having an environment
>variable LANG (LC_CTYPE, etc) that just names the locale, you have an
>environment variable that completely describes the locale, so you
>don't need to worry about local and remote systems refering to
>possibly inconsistent databases (sets of installed locales).

Yes. I forgot about termcap, but it only describes a given terminal 
(or printer) environment, and it wasn't really designed to describe a 
writing system including an input method, the way dates are encoded, 
etc. The problem with my suggestion is that it involves a major 
change in the *UNIX* OS philosophy, which I doubt will happen.

For those of you not familiar with Mac OS, a keyboard layout and 
input method are coupled to a language (group) AND a font system. 
Thus, there are keyboards and (simple in this case) input methods for 
most Slavic Languages that use the Cyrillic alphabet coupled to 
Cyrillic fonts. Each keyboard is also coupled to 
date/time/currency/number layout specs and a collation order. If 
anything is missing, you can't select the appropriate keyboard. Of 
course, if UTF-8 is used, the font situation for such a scheme 
reduces to determining if the relevant block has an available font: 
the question of an input method is still relevant.

Thus, what we want is a unified API so that all you (and me, maybe) 
developers can go about your business in a modular way, and so you 
can share code. In the environment that I am speaking of, those 
developing a UTF-8 xterm would use an API to access mapping tables, 
collating sequences, fonts and font metrics, etc, based on a global 
environment variable. The internal job of interpreting and displaying 
the glyphs would be somewhat simplified. In fact, the Mac OS 
environment includes text edit and display primitives (as does Java) 
that could be shared by ANY display application.

>Now, you obviously wouldn't want to encode the entire table of UCS
>character properties in an environment variable: even if it were
>practicable you probably wouldn't want to give people an easy way of
>creating private variations of Unicode. However, there are aspects of
>the current locale system that look to me as though they might make
>more sense if they were treated more in the TERMCAP way.

Yes. Missing fonts and input methods, collation methods, etc, would 
be trapped by default methods. I think that I have explained above 
that the idea is to provide an API that uses a (sparse) table lookup 
method to access relevant methods and features. Either the Mac OS 
model or the Java Text model are good places to start.

My interest (and my interest in monitoring this e-mail group) lies in 
the possibility of getting involved in an open WYSIWYG document 
editor based on XML and UTF-8, so I (and others) can get out of the 
thrall of Word. To be successful, such an application will HAVE to be 
a) WYSIWYG; b) multi platform; c) able to read and dump rtf format, 
even if the result is crippled; d) be modular and open (source and 
APIs), both to spread the development effort and to encourage its use.

The last point brings home the reason for the initial comments. There 
is already a LOT of code out there that does much of what is needed, 
but it is locked into various applications: it needs to be part of a 
library with an open set of APIs. Then, it can be added to 
independently, and pieces of code can be tuned (and bugs fixed) where 
necessary without breaking existing apps, and without re-inventing 
the wheel every time.

Such a project is BIG, and it frightens me, but I have wanted to 
start it for a long time now. I have recently retired and have a bit 
of spare time, so -- I am starting to accumulate data to assess the 
possibilities, and equipment on which to run LINUX and OpenBSD: I 
already have several Macs.

One possibility is to use the Java as the development environment 
with its Text and window facilities, since they really ARE portable, 
but I am not sure of two things: how open the result would be in 
terms of the OSF's aims and views; how efficient the code would be.

In saying all this, you need to realise that I haven't begun to 
evaluate all that is available, and I only have a vague idea at the 
moment of what some codes mentioned DO (luit and yudit, for 
instance). I hope to get up steam on this over Christmas, after I get 
equipment installed and running, and have time to read more web 
material on the projects and to download open source development 
tools and sources.

George
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: UTF8 Terminal Detection

Reply via email to