Edward H. Trager wrote:

> Inclusion of precomposed characters is a compromise aimed at achieving
...
> (Nobody is forcing me to use precomposed forms if I don't like them:
> Unicode also provides the combining characters). 

True. But I have the impression that Keld rather likes precomposed...

> What if we had a virtual "Keyboard" class from which we could 
> derive two sub-classes: a "DecomposedKeyboard" class, and a 
> "PrecomposedKeyboard" class?  

Classes? I find it somewhat scary to have to define (C++??, Java???)
classes to make keyboard layouts. At present, Linux (and other
Unixes) mostly use XKB which has a text format input for keyboard
layouts, which is unrelated to any "general purpose programming
language". Likewise, MacOS X has an XML based data file format for
specifying keyboard layouts. (Both systems "compile" these to
something more efficient for runtime.)

> These base classes would support non-European languages just the same.
> For example, an ArabicDecomposedKeyboard would emit u0628 BEH 
> + u0646 NOON
> for the "ﱭ" uFC6D BEH WITH NOON FINAL FORM LIGATURE, while the
> ArabicPrecomposedKeyboard would simply emit uFC6D.  If the user knows
> he has to interface or send data to some legacy system, then he knows
> which one to choose.  Otherwise, he doesn't care and goes 
> with the default "Decomposed" method for his language.

By and large, "decomposed" vs. "precomposed" only makes sense
for Latin, Greek, Cyrillic, Hangul, and Hiragana/Katakana.
There are precomposed letters also for other scripts, but mostly they
are not to be recommended. In particular the ones you find in U+Fxxx
should not be used (unless you REALLY have to). The one you referred
to is in addition a contextual form (a FINAL form) and should only be
used at the end of a word (as far as it has been written). Normally,
these things are dynamically handled via contextual shaping rules,
and the presentation form *characters* are NOT used. A data file on
Arabic shaping rules is maintained by Unicode (but NOT by ISO):
ArabicShaping.txt.

Why do something entirely different for the "console". Why not adapt
XKB so that it, and its data files, can work for the "console" too?
(Likewise for an input method mechanism (XIM??).)

                /kent k

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to