And an article to get things going, http://blogs.gnome.org/simos/2008/02/20/keyboard-layout-for-combining-diacritics/
Simos Simos Xenitellis wrote: > Hi Samuel, > > I had some discussions on this and I think the problem can be resolved > in the following way. > > To add combining diacritics there is no need for extra support in > GTK+; this is something that is handled by the keyboard layouts (which > are not handled by GTK+). > What that means is that you need a keyboard layout that produces all > those combining diacritics. > > The project for the keyboard layouts is xkeyboard-config, > http://freedesktop.org/wiki/Software/XKeyboardConfig > > For your case of Tagbanwa, you would create a new keyboard layout. > For the generic case to add combining diacritics to different > characters, a catch-all keyboard layout could be used. > > Currently, there is no GUI tool to create such keyboard layouts. > In your Linux system, keyboard layouts live in /etc/X11/xkb/symbols/ > You can have an idea how to modify an existing layout by looking into the > files. > > If you would like to pursue this further, I would be happy to give you > instructions. > > Simos > > On Feb 7, 2008 11:19 AM, Samuel Thibault <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> Simos wrote: >> >>> In bug #341341, Danilo talks about support for compose sequences that >>> produce more than one Unicode characters, as in >>> COMBINING ACUTE + CYRILLIC LATIN A where no precomposed form exists. >>> At the moment, the Xorg Compose file does not have such compose >>> sequences. If we were to implement in GTK+, I would suggest to build up >>> a new table of the form >>> >>> dead_acute, A, E, H, I, O, U, ... (assume all these cyrillic) >>> dead_diaeresis, A, E, H, I, O, U, ... (assume all these cyrillic) >>> >> The problem is that this is very tedious for people who already have a >> hard time making Linux suit to their language (fonts, messages, locales, >> ...) and can potentially be very big. For instance in vietnamese you may >> need to put two accents on a voyel, and so you'd need to enumerate all >> such possible combinations. >> >> >>> In check_algorithmic, we currently check if the compose sequence can be >>> normalised to a single Unicode character. >>> >> Which is necessary for proper string unicity/comparison etc, yes. >> >> >>> So, here we can also check if the compose sequence matches the "valid" >>> compose sequence (a cyrillic small 'a' with a combining acute is ok) >>> >> There is no such thing as a "valid" compose sequence. As Unicode says, >> >> "All combining characters can be applied to any base character and can, >> in principle, be used with any script. As with other characters, the >> allocation of a combining character to one block or another identifies >> only its primary usage; it is not intended to define or limit the range >> of characters to which it may be applied. In the Unicode Standard, all >> sequences of character codes are permitted. >> >> This does not create an obligation on implementations to support all >> possible combinations equally well. Thus, while application of an >> Arabic annotation mark to a Han character or a Devanagari consonant is >> permitted, it is unlikely to be supported well in rendering or to make >> much sense." >> >> So there are indeed combinations that don't make so much sense, but >> enumerating those that make looks to me unnecessary work: >> >> - It may be potentially very big, just see all the possible vietnamese >> combinations. >> - It will mostly never be complete, there will always be a language >> (say, for instance, tagbanwa) which nobody takes care of. >> - Why limiting ourselves like this? It has been objected that a generic >> support potentially leads to "odd" things like n̈̈̈, which is an n >> with three diaeresis on it. I don't think this is odd: if the user >> pressed the dead_diaeresis key several times, I guess he indeed wanted >> to have three diaeresis, and if they don't show up, then the text >> rendering engine is probably broken and may not for instance properly >> show ẫ, which is needed for vietnamese (actually, on my system, >> pango shows both fine). Actually I think some mathematicians may even >> have a use for n with several diaeresis :) >> >> >>> How would we know which compose sequences are "valid"? We can parse >>> parts of ftp.unicode.org/Public/UNIDATA/NormalizationTest.txt >>> >> It is _not_ a table of "valid" characters, it is only a partial test >> to check that the algorithm which transforms character + combining >> character into normalized precomposed form works correctly. Actually, >> a table that would hold _all_ the valid combinations would be very >> big. Just for the vietnamese language, there would be 10*6 entries. >> >> Instead, it could be solved once for all by systematically turning >> <dead_foo> <bar>, <combining_foo> <bar> and <Multi_key> <foo> <bar> into >> "Ubar Ucombining_foo". The only limitation is the font rendering engine, >> which seems to already do a pretty good job in all the cases: if I try >> to put a tagbanwa accent on a latin accent, it just works. If I try to >> put a combining kannara vocalic on a kannara character to which it isn't >> supposed to apply, it just shows the character and then the combining >> vocalic with a dotted circle. >> >> If the implementation can be generic enough that it works ASAN for every >> languages in the world without more work, then why not do it? >> >> Samuel >> _______________________________________________ gtk-i18n-list mailing list gtk-i18n-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-i18n-list