RE: filename and normalization (was gcc identifiers)

Maiorana, Jason Wed, 04 Dec 2002 14:52:01 -0800


>> For that reason, I dont like form D at all.  I wonder how much space
>> it would take to represent every possible Jamo-combination, then just
>> do away with combining characters alltogether...
>  No way!!  The biggest blunder ever made by Korean nat'l standard body
>is to insist that  11,172 modern precomposed syllables be encoded
>in Unicode/10646. Next biggest blunder they made is to encode tens
>of totally unnecessary cluster-Jamos when only 17+11+17+ a few more
>would have been more than sufficient. Next stupid thing they did is
>to remove compatibility decomposition between cluster Jamos and basic
>Jamo sequences although they should be canonically(not just compatibly)
>equivalent.  Now, you're saying that all possible combinations of them
>be encoded. How many? It's __infinite__ in theory. In practice, it
could
>be around 1.5 milllion.  That's more than the total number of
codepoints
>available in 20.1 bit coded character set which is ISO 10646/Unicode.


Wow, ok, I guess that idea wont work for Korean.
Also, since glyph swapping has to be done for merely adjacent
characters,
doing it for combining ones must be a relatively minor concern.

Out of curiousity, how many of those Korean letters are actually
made use of by the language? 1.5 million sounds higher than any
number of phoneme's that a human can produce.... (what if the 
cluster jamo's were dropped?)

Are we heading for a long-run scenario, where Form-D becomes canonical,
and all the old pre-composed codepoints are deprecated? NF-C seems
to be getting more and more entrenched from what I can tell...

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

RE: filename and normalization (was gcc identifiers)

Reply via email to