Re: Yudit and Tamil

Jungshik Shin Thu, 20 Dec 2001 08:18:54 -0800

On Thu, 20 Dec 2001, Gaspar Sinai wrote:

> On Wed, 19 Dec 2001 [EMAIL PROTECTED] wrote:
>
> > >Issues:
> > >1. Can Yudit just 'standardise' on its own?


> > How can you standardize without fonts?

> > >2. Is there a way to, say, register private
> > >area precompositions?
> >
> > There's no way to register anything in the private use zone.
> > There's an unofficial conlang registry, but this wouldn't

> Thanks for the info. If anyone knows anything about conlang
> registry - good or bad please tell me.

  I guess it's ConScript registry maintained by
John Cowan and Michael Everson (both are members of UTC and
perhaps ISO/IEC JTC1/SC2/WG2 as well. of course, this doesn't
give any official status to ConScript). The info is available
at <http://www.evertype.com/standards/csur/>. Howevre, I don't
think ConScript is  for encoding presentation forms or precomposed
ligatures/characters but is rather for encoding some scripts (
artificial/constructed) that
would never make themselves in Unicode/ISO 10646.

  I would stand corrected if conlang is another project
distinct from ConScript.


> > >3. I think presentation should be part of a
> > >standard. What do you think?

> > Also, including all precomposed characters in Unicode would
> > increase the number of characters immensely. Latin characters

 You're right. Some Korean TTF/OTFs can *dynamically*
generate  over a million(10^6) syllables out of ~100 basic alphabetic
letters. Inclduing even a 10k precomposed syllables  in BMP took a lot
of persuasion and politicking (btw, in retrospect, I believe it was the
worst blunder ever done by Korean standard body.  I wish Unicode and ISO
10646 had not given in on this issue and encoded only 2350 precomposed
syllables to get compatibile with the legacy Korean character set KS C
5601/KS X 1001) and there's no way to include a million precomposed forms.


> > >Do the glyphs in the private use area have to be
> > >"registered" somewhere? Otherwise, we would have to
> > >define different precompose sequences for each font,
> > >don't we?
> >
> > You can't register glyphs in the private use area. There
> > are several preexisting mappings over large sections of
> > the private use zone.

  I agree.

> > Either you'll have to make a font
> > standard or define different precompose sequences for each
> > font or

   I'm afraid this is rather hard to do.


> > encode the precompose sequences in the font. See
> > below for BDF fonts, or look up OpenType for ligatured
> > scalable fonts.

  I think this is the way to go.  I know it's easier said than
done, but...

> > http://www.wholehog.fsnet.co.uk/robert/indic/fonts.html

  Whether you use BDF-font-hack or opentype extension
for 'ligature', what you have to do is to implement what Uniscribe (in
MS-Windows), ATSUI (in MacOS), and Pango do; translating a sequnce of
characters (with proper Unicode code points) to a sequence of glyph
indices in a given font.  You're already doing it for Tamil using
a configuration file *external* to a font to convert a sequence of
characters to an index  of a precomposed glyph with a codepoint in PUA.

  What's the difference, then? In the former case,  the mapping from a
sequence of characters (with official) Unicode codepoints to a sequence
of glyphs/presentation forms (without Unicode codepoints) is *stored*
*inside* fonts so that any program (that knows how to do that) can
read off the mapping table to do the conversion.    On the other hand,
in your current approach the mapping table is *external to* fonts so
that you need different mapping tables for different fonts, which is
why you're seeking to standardize PUA code assignments if I understand
correctly. However, if you follow UniScribe/ATSUI path (which is what
Unicode/ISO 10646 architects envisioned when deciding not to encode
precomposed forms not found in legacy character sets), you don't have to
worry about flourishing of many different mappings between characters and
glyphs in different fonts. BTW, I may have to add Pango to UniScribe/ATSUI
because I believe Pango team has made a lot of progress on this front.

   Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Yudit and Tamil

Reply via email to