Hi Khaled, On 07/21/2012 05:49 AM, Khaled Hosny wrote: > How do I map output glyphs back to input characters? I assume I've to > use clusters for that, but I can't make much sense of the cluster > numbers I'm seeing and don't seem to find any explanation for them.
When you add text to a hb_buffer_t, you set a cluster number for each character. The functions hb_buffer_add_utf* implicitly use the index into the input string for the cluster. Ie. when using the UTF-8 version, UTF-8 indices are used. Note that hb-view/hb-shape by default use UTF-32 cluster numbers (ie. character-count instead of byte-count). You can change that using --utf8-clusters. The shaping process implicitly segments the input text + output glyphs in a series of clusters. So you can think of, for LTR text, first cluster followed by second cluster, followed by third cluster, etc, where each cluster contains a number of characters and a number of glyphs. Now, the hb_glyph_info_t::cluster member after shaping simply points to the minimum value of that member for all the characters that belong to the cluster. For RTL it's similar, though in reverse direction. Quick example. If you add text for "differ", then initially characters get cluster values 0,1,2,3,4,5 respectively. After shaping, if the 'ff' ligature was formed, you will get five glyphs, with cluster values 0,1,2,4,5. This means that the two characters that originally had cluster values 2 and 3 are represented by the sole glyph having the cluster value 2. Hope that helps. behdad > Regards, > Khaled _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
