RE: filename and normalization (was gcc identifiers)

Jungshik Shin Wed, 04 Dec 2002 14:34:14 -0800


On Wed, 4 Dec 2002, Maiorana, Jason wrote:


> If characters are ever introduced which have no precomposed codepoint,
> then it will be difficult for a font to "normalize" them to one
> glyph which has the appropriate internal layout. The font file itself
> would then have to know about composition rules, such as when
> X is composed with Y then Z, then use this glyph XYZ which has no
> single codepoint in unicode.

 Have you ever heard of Opentype and  AAT fonts? Modern font
technologies and modern rendering engines (Pango, AAT, Uniscribe,
Graphite) can all do that. Otherwise, how would Indic scripts be used
at all?  What you describe above is done by everyday by Pango,
Uniscribe and AAT/ATSUI, Graphite.


> For that reason, I dont like form D at all.  I wonder how much space
> it would take to represent every possible Jamo-combination, then just
> do away with combining characters alltogether...

  No way!!  The biggest blunder ever made by Korean nat'l standard body
is to insist that  11,172 modern precomposed syllables be encoded
in Unicode/10646. Next biggest blunder they made is to encode tens
of totally unnecessary cluster-Jamos when only 17+11+17+ a few more
would have been more than sufficient. Next stupid thing they did is
to remove compatibility decomposition between cluster Jamos and basic
Jamo sequences although they should be canonically(not just compatibly)
equivalent.  Now, you're saying that all possible combinations of them
be encoded. How many? It's __infinite__ in theory. In practice, it could
be around 1.5 milllion.  That's more than the total number of codepoints
available in 20.1 bit coded character set which is ISO 10646/Unicode.

  Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

RE: filename and normalization (was gcc identifiers)

Reply via email to