On Mon, Dec 03, 2007 at 02:16:00PM +1100, Russell Shaw wrote: > Hi, > I was thinking of making a multilingual text editor. > > I don't get how glyphs are done outside of english. > > I've read the Unicode Standard book. > > When a paragraph of unicode characters is processed, the glyphs > are layed out according to the state contained in the unicode > character sequence. > > Depending on this state, the same unicode characters can map to > multiple glyphs depending on context. > > If multiple fonts exist for a language, then for all these font > files to work with an editor, then all these glyphs must be indexed > the same. > > Where can i find the standard that specifies what glyphs are indexed > by what number? Or are these glyphs created on the fly by the unicode > paragraph layout processor?
The relevant standard is OpenType fonts, which contain the necessary tables for mapping sequences of characters to glyphs. The glyph indexing is specific to the particular font being used; there is no standard across fonts, and in fact some fonts will use precomposed glyphs while others will use constituent glyphs with positioning information to achieve the same result. OpenType was designed by Microsoft as an abstraction of TrueType and Type1 fonts with the necessary features for proper Unicode rendering. On Windows, Uniscribe/USP10.DLL is the code responsible for processing these tables. Correctly multilingualized applications will use its functions for text rendering (but all the standard Windows controls will do that for apps). The situation on Linux and *nix is a bit more diverse. Both GTK+ and Qt widgets provide semi-correct OpenType handling, but with lots of mistakes in handling scripts/languages their developers are not very familiar with. Qt uses its own code for this, while GTK+ uses the Pango library, an extremely slow “complex text layout” library which does a lot more than is needed for most uses, and which duplicates most of the font-specific logic in code, causing lots of headaches in addition to bloat and bad performance (Firefox with Pango enabled is many times slower than without; this is why many distributions still have Pango support disabled by default, causing many languages not to work...). I’m very much hoping for a future direction of proper OpenType rendering support without the need for Pango, but it requires someone spending some time to understand the problem domain. Basically it’s just a matter of applying substitution tables, and hard-coding lists of which tables are needed for which scripts in Unicode and the order in which they should be applied. (Originally they were intended to be applied in the order they appear in the font files, but then MS went and made their implementation hard-code the order, so other implementations need to follow that in order to handle fonts properly — or at least that’s my understanding.) The OpenType specs themselves are available at Microsoft’s website, but they’re very poorly documented. Reading them alone is insufficient to make an implementation unless you already know basically what the implementation must do, IMO — something like RFC 1459 in quality... There’s a (semi-)new library called Harfbuzz which, as I understand it, is purely the OpenType logic, without all the bloat of Pango. I’m not sure what stage it’s at these days, but it might be a good place to begin your search. Of course if your app depends on GTK+ or Qt you can just use their widgets and forget about the whole issue, but I hope someone will move things forward for OpenType font support without the need for these toolkits. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/