Hi Werner, Thank you for commenting on this. On Mon, 25 Nov 2002, Werner LEMBERG wrote: > > The idea is: > > Assign codes and hot spots for all possible Glyph componenents, > > per script, per language system. > > How will you handle open-ended scripts like Urdu where the number of > ligatures is changing while the language evolves? For example, I was > told by an Urdu computer scientist that during a visit of Margaret > Thatcher (a former Prime Minister of England) the newspapers created a > new ligature for her name. > > > Create a generic state machine thet can step through the input > > unicode characters, and spit out Glyph components and their > > relative hot spot positions. > > This is far more complicated I fear. You will need fallback > algorithms for fonts which don't provide some glyphs/ligatures, etc. > Some fonts have e.g. `Amacron' as a single glyph, others compose it > from `A' with a macron accent.
Talking about ligatures, what I am really afraid of is having a scirpt encoded today, viewed it with tomorrow's font and not seeing what I wrote today. What I was thinking is that all the compulsory ligatures need to be defined. If a new ligature arrives, a new scriptcode has to be created or ZWJ used to form it. This way the abiguity goes away. For non-compulsory ligatures/non-ligatures we could still use ZWJ and ZWNJ characters. I admit, the task is not simple. That's why I posted it instead of just implementing it straight away. And the most complicated part is the definitions. The hard part is: in a scriptode and language system what are the compuslsory ligatures. > > . Create a generic inverse state machine. The input is > > components and their relative hot spot positions and the > > output is unicode stream. > > You can do that already by following the Adobe Glyph List (AGL) > algorithm for naming glyphs. Thanks for the reference. I also think there are a lot of things out there from which we could learn. Sorry if I can not fully attend the discussion I started, but I am extremely tied down with other things at the momemnt. -- G̳á̳s̳p̳á̳r̳ ガーシュパール・Гашпар・가스팔・Γασπαρ・גאשפאר עברי 10-2*5 -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
