Greetings,

Behdad Esfahbod said that this would be a good place to ask, even though the underlying software is Pango and not Harfbuzz. I am trying to resolve a bug in Inkscape

  https://bugs.launchpad.net/inkscape/+bug/1282968

that involves this string in Telugu:

U+0C07,U+0C02,U+0C15,U+0C4D,U+200C,U+0C38,U+0C4D,U+0C15,U+0C47,U+0C2A,U+0C4D

As will soon become apparent, I know next to nothing about Indic languages. Counting "stacked glyphs" as one glyph that is supposed to render as 6 glyphs

https://bugs.launchpad.net/inkscape/+bug/1282968/+attachment/3989437/+files/correct-rendering.png

Pango breaks these 6 up into 3 logical clusters as 2:3:1.

First question: are the colon positions the proper places to insert kerning spaces?

pango_shape() descriptions of most languages have the property that each logical cluster begins with a character with the "is_cursor_position" attribute set, and it is not set elsewhere in the cluster. (In the European languages each logical cluster is usually one letter with possibly one or more accents or other similar modifiers.) That is almost the case here too, except there is a second character within the 2nd logical cluster that also has that bit set (character 8, 0C15).

Behdad referred me to this document:

http://www.w3cindia.in/Indic-req-draft/Indic-layout-requirements.html#letter-spac

which says that aksharas are supposed to move around as a block. Cursor positions are generally where kerning spaces are inserted. I don't know how to reconcile this situation, so...

Second question: Is the 2nd logical cluster returned by Pango something larger than an akshara?

Third question: in a text editor for this sort of language, are the cursor positions restricted to the akshara transitions, or does the cursor move around within an akshara stopping at each "stack glyph"? What happens now in Inkscape is that one can delete unicode characters off the tail of a logical cluster, but there is no way to move around within it. This text can only be entered with control codes (^U0C07<return>^U0C02<return> etc).

Fourth and last question: the string includes 200C, which is a "zero width line nonjoiner". There is no corresponding glyph - all it does is prevent two adjacent characters from merging into a different kind of glyph. In terms of cursor motion, if one were moving across the text left to right with "right arrow" presses, would one expect the cursor to stay in the corresponding spot for two such presses, as if moving across a zero width character, or would its presence not affect cursor motion?

Thank you,

David Mathog
[email protected]
Manager, Sequence Analysis Facility, Biology Division, Caltech
_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to