Followup to: <000801c19a78$28df7250$b4e21081@chalmers95a69n>
By author: "Kent Karlsson" <[EMAIL PROTECTED]>
In newsgroup: linux.utf8
>
> Source separation rule also for the 8859 series of standards gives that
> they had to be separately encoded.
>
> But even so, they had to be separated: similar-looking uppercase forms
> have different corresponding lowercase forms. So as not to make case
> mapping horribly difficult (it's hard enough as it is!), Latin, Greek,
> and Cyrillic had to be non-unified.
>
Using that rule, there should be a TURKISH CAPITAL I which lowercase
as U+0131 LATIN SMALL LETTER DOTLESS I, and similarly, there should be
a TURKISH LOWER CASE I which uppercases as U+0130 LATIN CAPITAL LETTER
I WITH DOT.
However, I agree with you on the source separation rule, but I also
maintain that there was not much pressure to unify the alphabetic
scripts simply because of the small number of codepoints concerned:
all of Latin, Greek and Cyrillic including modifiers, combining
characters, numbers and unassigned codepoints still account for less
than 2000 codepoints; all the alphabetic scripts in the BMP combined
acount only for 8192 codepoints (not counting the Arabic presentation
forms and halfwidth/doublewidth compatibility variants, but counting
unallocated codepoints.)
-hpa
--
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[EMAIL PROTECTED]>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/