On Thu, 2 Jan 2003, Edward Cherlin wrote: > On Sunday 24 November 2002 06:22 pm, Gaspar Sinai wrote: > > Hi, > > I have a new redering schema in mind, that I am going to > > implement in a future version of Yudit. > > I'm going to request that you not do this. Please concentrate your efforts on > Yudit, which is my editor of choice. Please consider using Pango or Graphite > instead. At least talk to the developers about the issues they see in > implementing a rendering engine and modifying the OpenType format.
It is very unlikely that I will have enough time for this so don't worry :) Of course Yudit enhancememtns are of highest priority now. Thank you very much for using Yudit, I am very happy that I created something useful. I wish I had more time for it - If I had it would probably not suffer from the 'least effort' design effects... > > The major problems with it are: > > ① Very difficult to test the rendering software. Features, > > need to be applied in a certain order. Even one mix-up will > > result in bad rendering, and a very complex test should be > > manually performed to catch the bug. > > Any rendering engine will be difficult to test, because of the complexity of > the fundamental processes. If not OT features, the same capabilities have to > be implemented some other way. What I miss most during testing is automated testing. This is what I would like have with a reversible rendering algorithm. > > ② OpenType tables can not be shared between two fonts of the > > same time, although similar positioning/substituting needs > > to be performed. This makes the font file unnecessarily big. > > If this is a real issue, then we can devise a way to share the tables. > I don't think it is an issue. Most fonts are well under 100K, > except CJK. I would be very happy if tables could be shared. This would be very useful when multiple fonts are used with not very much physical memory. > > > ③ OpenType is not an Open Standard. > > Defining a new font format might be a good idea, if we can get the designers > of font servers and rendering engines to agree on it. > > > ④ Rendering is a non-reversible process > > True regardless of the engine. I would argue on this. With a small and different set of glyphs, especially when the script is known even the screen image could be converted back to Unicode. Of course, this would require more efforts, but this is not impossibly difficult like, say translating into a different language. But we don't need to go that far: it would be enough to convert back the Glyph componnets into the original input (see Glyph component definition later in this mail. I do not have better terminology. In OTF they simply call them Glyphs - even thogh they may be just part of a cluster) What is the right word for this tiny components that make up the clusters? Like a variant of JAMO that appears at a spacific place. When writing a program I usually put very high emphasis on testablility. (Well, this is mostly true at work, for Yudit I have just a few hours or even less spare time.) I try to increase testability by adhering to my principles: - Using independent reversible algorithms to facilitate automatic testing. - Using algorigthms that have very few branches. If they have some, use algorithms where each brach will execute at least once during testing. Use a mathematical formulae that has global effects if wrong to replace a branch. - Mixing independent implementations of routines randomly with parrallel path execution so that any small bug would have global effect. This is just like cutting a wire in a multiplier device: multipling random numbers would show this immediately, and if you test it with a pattern expecting cut and short-circuited wires the error shows up even faster. So returning to the point: I think it is possible to design a rendering engine that could be tested. Even a pentium processor can be tested pretty well, even though it contains a couple of orders of magnitude as many gates as the number of characters in Unicode. Talking about hardware: has anyone made a handware accelerated Unicode rendering engine yet? This would have enormous effects. > > The idea is: > > Ⅰ. Assign codes and hot spots for all possible Glyph componenents, > > per script, per language system. > > Impossible. You can't predefine all glyphs for a language, much less for a > script. > > The first counterexample to consider is Korean, where proper rendering has to > reshape and reposition glyphs in a two-dimensional context. There are about > 40 letters, each of which appears in hundreds of slight variations, and it is > not difficult to make up syllables which would require new variations. > > Next consider Arabic-script languages, where the set of ligatures is > language-and font-specific, and remains highly variable. > > Indic scripts have similar problems about the variety of ligatures. > > When stacking combining characters, whether multiply-accented Latin, math > symbols, or Sanskrit written in the Tibetan alphabet, it has not proven > possible to list all of the possibilities. Some of the positioning and > shaping has to be done dynamically. By Glyph elements I mean the smallest components of a cluster. The process would look like converting Unicode to glyphs elements and positions, and for testing using these elements and positions converting them back into Unicode - they should convert back to the same or equivalent Unicode stream. Render: U+XXXX U+YYYY U+ZZZZ R-> G+MMMM G+NNNN Test: G+MMMM G+NNNN T-> U+XXXX U+YYYY U+ZZZZ or its equivalent. For Korean, just a few variants of JAMO are needed: they can sit in just a few different places, with different sizes. Tibtean Glyph elements also could have just a few places and sizes. For Indic and Arabic the need of compulsory ligature definitions is where I see a problem. In an OTF these Glyph elements are defined anyway: they do not contain an endless number of Glyphs, but the number of combination is huge, just like in our case. > > Ⅱ. Create a generic state machine thet can step through the input > > unicode characters, and spit out Glyph components and their relative > > hot spot positions. > > Ⅲ. Create states and a dynamically loadable state table per script > > per language system. > > Per font, in some scripts. This is where the difficulty comes in: there is no standard now that says what are the compulsory ligatures and what are not. I suggest these should be defined. For English speakers the simple fi ligatures is shown in the examples. Most people think that is is as simple as that. Changing the font will render the fi differently. There are a lot of combinations that look, unlike fi, completely different when ligated. They do not look even remotely like the unligated components. If we rely on the font, authors can not trust plain text files, they must use word processors to store the font information in the file to avoid unwanted ligation with a different font. Plain text without compulsory ligature definitions is meaningless, IMHO. CJK Unification poses less problems: I can figure out what font needs to be used for different portions of the text just by looking at it. For Indic, Arabic I have no way of determining the font the have the correct ligatures to display my plain text files, the way the author wanted. As I understand, Unicode defines scripts and codepoints, but they do not go out of their way to define how they should look like on the screen. (Even glyph variant samples are not defined for CJK in the Unicode book.) Authors that write plain text should be aware that the text may change considerably when viewed with a different font. This might be a similar to a handwritten document when it gets printed. But at least handwritten ligatures can appear as ligatures in print too. If compulsory ligatures are not defined I can not even write a palindrome. It may not look like a palindrome with a different font at all. An old document may also look strange with ligatures invented later. In a sense Unicode is pretty democratic: there are more viewers than writers: readers have more freedom :) > > Ⅳ. Create bitmap and vector fonts. The glyph codepoints are > > defined in (Ⅰ.) so this will be an easy process. Much easier > > than creating OpenType tables. > > Exactly equivalent to creating OpenType tables. You just make fewer copies of > the data (assuming availability of Free/Open Source fonts). The difference is that the font creators would have all the Glyph components defined in a consistent way in our case. VOLT helps a lot when creating OT Fonts. But I fear it is still be possible to miss something and having one untested input combination rendered in an unexpected way. There are to many branches because of the exeptions, and because of the numerous features an OTF has. > > Ⅶ. Use (Ⅱ) and pass it to (Ⅴ) to see if we get back out stream. > > We can test the rendering engine on-the fly this way. > > Ⅷ. Use (Ⅴ.) for OCR (character recognition) software to scan > > text images into Unicode stream. > > This is all - I am running out of Roman numerals ☺ > > > > The merits of such a rendering/font schema would be: > > - Fonts do not need to carry extra extra tables > > Fewer copies, but not less data. That is right: but this data could be shared, resulting in less memory consumption, when the system is using different fonts. > > - Rendering is linear and needs very littel processing power. > > ?? Certainly this cries for more explanation. First part of the explanation: I never read my outgoing mails twice and I never use a spell checker, sorry for this. What I mean here is that I would like to achive the whole thing in one pass instead of going through the glyph substitution process in more passes. Hopefully it won't look like my emails ;) > > - It is testable > > - It is bitmap-font-friendly > > I can't imagine it. How do you do adjustable glyphs in bitmaps? I can't see why the algorthm would not work with bitmap fonts... Happy New Year! G̳á̳s̳p̳á̳r̳ ・ ɡaːʃpaːr ガーシュパール・Гашпар・가스팔・גאשפאר・گاشپار 17-3*5 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
