Re: Updating gtkimcontextsimple.c (bug #321896)

Simos Xenitellis Wed, 20 Feb 2008 08:43:46 -0800

And an article to get things going,

http://blogs.gnome.org/simos/2008/02/20/keyboard-layout-for-combining-diacritics/


Simos

Simos Xenitellis wrote:
> Hi Samuel,
>
> I had some discussions on this and I think the problem can be resolved
> in the following way.
>
> To add combining diacritics there is no need for extra support in
> GTK+; this is something that is handled by the keyboard layouts (which
> are not handled by GTK+).
> What that means is that you need a keyboard layout that produces all
> those combining diacritics.
>
> The project for the keyboard layouts is xkeyboard-config,
> http://freedesktop.org/wiki/Software/XKeyboardConfig
>
> For your case of Tagbanwa, you would create a new keyboard layout.
> For the generic case to add combining diacritics to different
> characters, a catch-all keyboard layout could be used.
>
> Currently, there is no GUI tool to create such keyboard layouts.
> In your Linux system, keyboard layouts live in /etc/X11/xkb/symbols/
> You can have an idea how to modify an existing layout by looking into the 
> files.
>
> If you would like to pursue this further, I would be happy to give you
> instructions.
>
> Simos
>
> On Feb 7, 2008 11:19 AM, Samuel Thibault <[EMAIL PROTECTED]> wrote:
>   
>> Hello,
>>
>> Simos wrote:
>>     
>>> In bug #341341, Danilo talks about support for compose sequences that
>>> produce more than one Unicode characters, as in
>>> COMBINING ACUTE + CYRILLIC LATIN A where no precomposed form exists.
>>> At the moment, the Xorg Compose file does not have such compose
>>> sequences. If we were to implement in GTK+, I would suggest to build up
>>> a new table of the form
>>>
>>> dead_acute, A, E, H, I, O, U, ...  (assume all these cyrillic)
>>> dead_diaeresis, A, E, H, I, O, U, ...  (assume all these cyrillic)
>>>       
>> The problem is that this is very tedious for people who already have a
>> hard time making Linux suit to their language (fonts, messages, locales,
>> ...) and can potentially be very big. For instance in vietnamese you may
>> need to put two accents on a voyel, and so you'd need to enumerate all
>> such possible combinations.
>>
>>     
>>> In check_algorithmic, we currently check if the compose sequence can be
>>> normalised to a single Unicode character.
>>>       
>> Which is necessary for proper string unicity/comparison etc, yes.
>>
>>     
>>> So, here we can also check if the compose sequence matches the "valid"
>>> compose sequence (a cyrillic small 'a' with a combining acute is ok)
>>>       
>> There is no such thing as a "valid" compose sequence. As Unicode says,
>>
>> "All combining characters can be applied to any base character and can,
>> in principle, be used with any script. As with other characters, the
>> allocation of a combining character to one block or another identifies
>> only its primary usage; it is not intended to define or limit the range
>> of characters to which it may be applied.  In the Unicode Standard, all
>> sequences of character codes are permitted.
>>
>> This does not create an obligation on implementations to support all
>> possible combinations equally well. Thus, while application of an
>> Arabic annotation mark to a Han character or a Devanagari consonant is
>> permitted, it is unlikely to be supported well in rendering or to make
>> much sense."
>>
>> So there are indeed combinations that don't make so much sense, but
>> enumerating those that make looks to me unnecessary work:
>>
>> - It may be potentially very big, just see all the possible vietnamese
>>   combinations.
>> - It will mostly never be complete, there will always be a language
>>   (say, for instance, tagbanwa) which nobody takes care of.
>> - Why limiting ourselves like this? It has been objected that a generic
>>   support potentially leads to "odd" things like n̈̈̈, which is an n
>>   with three diaeresis on it.  I don't think this is odd: if the user
>>   pressed the dead_diaeresis key several times, I guess he indeed wanted
>>   to have three diaeresis, and if they don't show up, then the text
>>   rendering engine is probably broken and may not for instance properly
>>   show ẫ, which is needed for vietnamese (actually, on my system,
>>   pango shows both fine).  Actually I think some mathematicians may even
>>   have a use for n with several diaeresis :)
>>
>>     
>>> How would we know which compose sequences are "valid"? We can parse
>>> parts of ftp.unicode.org/Public/UNIDATA/NormalizationTest.txt
>>>       
>> It is _not_ a table of "valid" characters, it is only a partial test
>> to check that the algorithm which transforms character + combining
>> character into normalized precomposed form works correctly. Actually,
>> a table that would hold _all_ the valid combinations would be very
>> big. Just for the vietnamese language, there would be 10*6 entries.
>>
>> Instead, it could be solved once for all by systematically turning
>> <dead_foo> <bar>, <combining_foo> <bar> and <Multi_key> <foo> <bar> into
>> "Ubar Ucombining_foo". The only limitation is the font rendering engine,
>> which seems to already do a pretty good job in all the cases: if I try
>> to put a tagbanwa accent on a latin accent, it just works. If I try to
>> put a combining kannara vocalic on a kannara character to which it isn't
>> supposed to apply, it just shows the character and then the combining
>> vocalic with a dotted circle.
>>
>> If the implementation can be generic enough that it works ASAN for every
>> languages in the world without more work, then why not do it?
>>
>> Samuel
>>     

_______________________________________________
gtk-i18n-list mailing list
gtk-i18n-list@gnome.org
http://mail.gnome.org/mailman/listinfo/gtk-i18n-list

Re: Updating gtkimcontextsimple.c (bug #321896)

Reply via email to