Hi Samuel, I had some discussions on this and I think the problem can be resolved in the following way.
To add combining diacritics there is no need for extra support in GTK+; this is something that is handled by the keyboard layouts (which are not handled by GTK+). What that means is that you need a keyboard layout that produces all those combining diacritics. The project for the keyboard layouts is xkeyboard-config, http://freedesktop.org/wiki/Software/XKeyboardConfig For your case of Tagbanwa, you would create a new keyboard layout. For the generic case to add combining diacritics to different characters, a catch-all keyboard layout could be used. Currently, there is no GUI tool to create such keyboard layouts. In your Linux system, keyboard layouts live in /etc/X11/xkb/symbols/ You can have an idea how to modify an existing layout by looking into the files. If you would like to pursue this further, I would be happy to give you instructions. Simos On Feb 7, 2008 11:19 AM, Samuel Thibault <[EMAIL PROTECTED]> wrote: > Hello, > > Simos wrote: > > In bug #341341, Danilo talks about support for compose sequences that > > produce more than one Unicode characters, as in > > COMBINING ACUTE + CYRILLIC LATIN A where no precomposed form exists. > > At the moment, the Xorg Compose file does not have such compose > > sequences. If we were to implement in GTK+, I would suggest to build up > > a new table of the form > > > > dead_acute, A, E, H, I, O, U, ... (assume all these cyrillic) > > dead_diaeresis, A, E, H, I, O, U, ... (assume all these cyrillic) > > The problem is that this is very tedious for people who already have a > hard time making Linux suit to their language (fonts, messages, locales, > ...) and can potentially be very big. For instance in vietnamese you may > need to put two accents on a voyel, and so you'd need to enumerate all > such possible combinations. > > > In check_algorithmic, we currently check if the compose sequence can be > > normalised to a single Unicode character. > > Which is necessary for proper string unicity/comparison etc, yes. > > > So, here we can also check if the compose sequence matches the "valid" > > compose sequence (a cyrillic small 'a' with a combining acute is ok) > > There is no such thing as a "valid" compose sequence. As Unicode says, > > "All combining characters can be applied to any base character and can, > in principle, be used with any script. As with other characters, the > allocation of a combining character to one block or another identifies > only its primary usage; it is not intended to define or limit the range > of characters to which it may be applied. In the Unicode Standard, all > sequences of character codes are permitted. > > This does not create an obligation on implementations to support all > possible combinations equally well. Thus, while application of an > Arabic annotation mark to a Han character or a Devanagari consonant is > permitted, it is unlikely to be supported well in rendering or to make > much sense." > > So there are indeed combinations that don't make so much sense, but > enumerating those that make looks to me unnecessary work: > > - It may be potentially very big, just see all the possible vietnamese > combinations. > - It will mostly never be complete, there will always be a language > (say, for instance, tagbanwa) which nobody takes care of. > - Why limiting ourselves like this? It has been objected that a generic > support potentially leads to "odd" things like n̈̈̈, which is an n > with three diaeresis on it. I don't think this is odd: if the user > pressed the dead_diaeresis key several times, I guess he indeed wanted > to have three diaeresis, and if they don't show up, then the text > rendering engine is probably broken and may not for instance properly > show ẫ, which is needed for vietnamese (actually, on my system, > pango shows both fine). Actually I think some mathematicians may even > have a use for n with several diaeresis :) > > > How would we know which compose sequences are "valid"? We can parse > > parts of ftp.unicode.org/Public/UNIDATA/NormalizationTest.txt > > It is _not_ a table of "valid" characters, it is only a partial test > to check that the algorithm which transforms character + combining > character into normalized precomposed form works correctly. Actually, > a table that would hold _all_ the valid combinations would be very > big. Just for the vietnamese language, there would be 10*6 entries. > > Instead, it could be solved once for all by systematically turning > <dead_foo> <bar>, <combining_foo> <bar> and <Multi_key> <foo> <bar> into > "Ubar Ucombining_foo". The only limitation is the font rendering engine, > which seems to already do a pretty good job in all the cases: if I try > to put a tagbanwa accent on a latin accent, it just works. If I try to > put a combining kannara vocalic on a kannara character to which it isn't > supposed to apply, it just shows the character and then the combining > vocalic with a dotted circle. > > If the implementation can be generic enough that it works ASAN for every > languages in the world without more work, then why not do it? > > Samuel > _______________________________________________ > gtk-i18n-list mailing list > gtk-i18n-list@gnome.org > http://mail.gnome.org/mailman/listinfo/gtk-i18n-list > _______________________________________________ gtk-i18n-list mailing list gtk-i18n-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-i18n-list