On Thu, 20 Dec 2001, Gaspar Sinai wrote: > On Wed, 19 Dec 2001 [EMAIL PROTECTED] wrote: > > > >Issues: > > >1. Can Yudit just 'standardise' on its own?
> > How can you standardize without fonts? > > >2. Is there a way to, say, register private > > >area precompositions? > > > > There's no way to register anything in the private use zone. > > There's an unofficial conlang registry, but this wouldn't > Thanks for the info. If anyone knows anything about conlang > registry - good or bad please tell me. I guess it's ConScript registry maintained by John Cowan and Michael Everson (both are members of UTC and perhaps ISO/IEC JTC1/SC2/WG2 as well. of course, this doesn't give any official status to ConScript). The info is available at <http://www.evertype.com/standards/csur/>. Howevre, I don't think ConScript is for encoding presentation forms or precomposed ligatures/characters but is rather for encoding some scripts ( artificial/constructed) that would never make themselves in Unicode/ISO 10646. I would stand corrected if conlang is another project distinct from ConScript. > > >3. I think presentation should be part of a > > >standard. What do you think? > > Also, including all precomposed characters in Unicode would > > increase the number of characters immensely. Latin characters You're right. Some Korean TTF/OTFs can *dynamically* generate over a million(10^6) syllables out of ~100 basic alphabetic letters. Inclduing even a 10k precomposed syllables in BMP took a lot of persuasion and politicking (btw, in retrospect, I believe it was the worst blunder ever done by Korean standard body. I wish Unicode and ISO 10646 had not given in on this issue and encoded only 2350 precomposed syllables to get compatibile with the legacy Korean character set KS C 5601/KS X 1001) and there's no way to include a million precomposed forms. > > >Do the glyphs in the private use area have to be > > >"registered" somewhere? Otherwise, we would have to > > >define different precompose sequences for each font, > > >don't we? > > > > You can't register glyphs in the private use area. There > > are several preexisting mappings over large sections of > > the private use zone. I agree. > > Either you'll have to make a font > > standard or define different precompose sequences for each > > font or I'm afraid this is rather hard to do. > > encode the precompose sequences in the font. See > > below for BDF fonts, or look up OpenType for ligatured > > scalable fonts. I think this is the way to go. I know it's easier said than done, but... > > http://www.wholehog.fsnet.co.uk/robert/indic/fonts.html Whether you use BDF-font-hack or opentype extension for 'ligature', what you have to do is to implement what Uniscribe (in MS-Windows), ATSUI (in MacOS), and Pango do; translating a sequnce of characters (with proper Unicode code points) to a sequence of glyph indices in a given font. You're already doing it for Tamil using a configuration file *external* to a font to convert a sequence of characters to an index of a precomposed glyph with a codepoint in PUA. What's the difference, then? In the former case, the mapping from a sequence of characters (with official) Unicode codepoints to a sequence of glyphs/presentation forms (without Unicode codepoints) is *stored* *inside* fonts so that any program (that knows how to do that) can read off the mapping table to do the conversion. On the other hand, in your current approach the mapping table is *external to* fonts so that you need different mapping tables for different fonts, which is why you're seeking to standardize PUA code assignments if I understand correctly. However, if you follow UniScribe/ATSUI path (which is what Unicode/ISO 10646 architects envisioned when deciding not to encode precomposed forms not found in legacy character sets), you don't have to worry about flourishing of many different mappings between characters and glyphs in different fonts. BTW, I may have to add Pango to UniScribe/ATSUI because I believe Pango team has made a lot of progress on this front. Jungshik Shin -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
