On Sun, Jul 22, 2012 at 11:37:23PM -0400, Behdad Esfahbod wrote: > Hi Khaled, > > On 07/21/2012 05:49 AM, Khaled Hosny wrote: > > How do I map output glyphs back to input characters? I assume I've to > > use clusters for that, but I can't make much sense of the cluster > > numbers I'm seeing and don't seem to find any explanation for them. > > When you add text to a hb_buffer_t, you set a cluster number for each > character. The functions hb_buffer_add_utf* implicitly use the index into the > input string for the cluster. Ie. when using the UTF-8 version, UTF-8 indices > are used. > > Note that hb-view/hb-shape by default use UTF-32 cluster numbers (ie. > character-count instead of byte-count). You can change that using > --utf8-clusters.
I’m using UTF-16 (playing with porting LibreOffice to HarfBuzz), so how surrogate pairs are handled? > The shaping process implicitly segments the input text + output glyphs in a > series of clusters. So you can think of, for LTR text, first cluster followed > by second cluster, followed by third cluster, etc, where each cluster contains > a number of characters and a number of glyphs. > > Now, the hb_glyph_info_t::cluster member after shaping simply points to the > minimum value of that member for all the characters that belong to the > cluster. > > For RTL it's similar, though in reverse direction. > > Quick example. If you add text for "differ", then initially characters get > cluster values 0,1,2,3,4,5 respectively. After shaping, if the 'ff' ligature > was formed, you will get five glyphs, with cluster values 0,1,2,4,5. This > means that the two characters that originally had cluster values 2 and 3 are > represented by the sole glyph having the cluster value 2. > > Hope that helps. Thanks Behdad, this was very helpful. Regards, Khaled _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
