2008-09-02 (화), 13:19 -0500, Adam Majer: > Christian Perrier wrote: > >> Le samedi 23 août 2008 à 19:59 -0500, Adam Majer a écrit : > >>> Package: gedit > >>> Version: 2.22.3-1 > >>> Severity: normal > >>> > >>> The following UTF-8 string is not correctly handled in gedit, > >>> > >>> const char *unicode_insert = "?Э"; > >>> > >>> The " and the ? characters are viewed as one character, making the > >>> entire thing next to impossible to copy/paste/edit. > >> Looks like an issue in pango, since it is not specific to gedit. > >> > >> Such things seem to happen a lot when using Tibetan characters, so this > >> may or may not be intentional. I’d prefer to have the input of someone > >> who uses them. Is there anyone on debian-i18n who’s more knowledgeable > >> about Tibetan glyphs? > > > > > > Adding Pema Geyleg and Tenzin Dendup, our fellow Dzongkha translation > > coordinators, who certainly have skills about Tibetan-family scripts > > (Dzongkha is one of these) and could maybe point you to people with > > needed knowledge. > > > I'm sorry, but aren't we missing the entire point here? This is not > about bad handling of some Tibetan characters. It is about bad handling > of 3-byte UTF-8 characters. > > http://en.wikipedia.org/wiki/UTF-8 > > So, the following characters should have the same problems, > > "ऄक > > "ঈউঊ > > "ਜਗਏ > > "ଜଁଂ > > "ஔ > > "ంఁః > > "ಂಖ > > "ഈഃ > > etc.. > > > I've put a Ascii " in front of all the different characters. In emacs, > I'm able to select the " in front of these characters and copy it. vim > under a UTF-8 gnome terminal also allows the " to be selected. The 2nd > last line above (using icedove), I can't independently select the " but > I can select the " and ಂ together and then remove the 2nd character. > > Maybe it is just my misunderstanding of UTF-8, I'm not sure. But at > least my expected behaviour was being able to select 1 UTF-8 character > at a time, even if linguistically it does not make any sense.
The Tibetan code in this case, U+0FA1 is NOT a character. It's a Tibetan code for combining with other Tibetan codes to form a Tibetan character. Unicode code points do not necessarily represent characters. Selecting combined character is more expected than selecting its sub-parts (even when it's possible). This issue is about handling Unicode combining. In this case, Pango interprets a quote mark (") and U+0FA1 Tibetan code (wrong combination) as one combined character. I'm not sure whether it's a defined behavior. -- Changwoo Ryu <[EMAIL PROTECTED]>
signature.asc
Description: This is a digitally signed message part