ok thanks!! && in that case yall best fix the example at https://github.com/behdad/harfbuzz/blob/master/src/sample.py then because it just uses string.encode('utf-x'). which is confusing.
On Thu, Jun 16, 2016 at 10:44 PM, Khaled Hosny <[email protected]> wrote: > On Thu, Jun 16, 2016 at 09:35:03PM -0400, Kelvin Ma wrote: > > When I run a simple harfbuzz shaping like > > > > string = 'In begíffi our ' > > > utfstring = string.encode('utf-8') > > > > > > buf = hb.buffer_create() > > > hb.buffer_add_utf8(buf, utfstring, 0, -1) > > > hb.buffer_guess_segment_properties(buf) > > > > > > hb.shape(font, buf, []) > > > infos = hb.buffer_get_glyph_infos(buf) > > > positions = hb.buffer_get_glyph_positions(buf) > > > > > > > I get > > > > len(string) = 15 > > len(infos) = 13 > > len(positions) = 13 > > > > which makes sense, three glyphs became one so 15 characters makes 13 > > glyphs. But the cluster values are wrong because they don’t line up with > > the character indexes any more (because of the accented character). > > > > But then when I change it to utf-16 > > > > string = 'In begíffi our ' > > > utfstring = string.encode('utf-16') > > You need here a list of UTF-16 code units, but string.encode('utf-16') > just gives you UTF-16 bytes array. You need something like: > > utfstring = [int.from_bytes(c.encode("utf-16be"), byteorder='big') for c > in string] > > (This does not handle non-BMP characters that will be encoded as two > UTF-16 code units, but you get the idea). > > > > hb.buffer_add_utf16(buf, utfstring, 0, -1) > > And pass the list length here (or add null character at the end of the > list). > > > And when I change it to utf-32, which this post > > <http://comments.gmane.org/gmane.comp.freedesktop.harfbuzz/1836> says > > should make it give character counts, but > > Same as above. > > Regards, > Khaled >
_______________________________________________ HarfBuzz mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/harfbuzz
