[HarfBuzz] Harfbuzz on Windows
Hi Behdad, I want to build harfbuzz for windows... What replacement of unicode functions should I use? Is there any native support in windows that I can use to provide the unicode functions. -- Samiullah Khawaja ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] HarfBuzz.old backend in new HarfBuzz
Hi, Quick heads up that I just pushed a new backend in HarfBuzz that uses the HarfBuzz.old shaper. All code is in-tree, so no dependencies. You can choose the old shaper by its name old using --shaper or hb_shape_full(). Pretty much like the Uniscribe and CoreText backends, this new backend is primarily for testing, and may be removed in the future (after I have convinced everyone to move to the real HarfBuzz). Coming soon... iculayout backend perhaps. And lets see if Jonathan will do a DWrite backend any time soon. Cheers, behdad ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] harfbuzz-ng: Branch 'master'
src/hb-unicode-private.hh |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) New commits: commit 35bdab3cf1f0836807160e3ce93766c321b32e8c Author: Behdad Esfahbod beh...@behdad.org Date: Wed Jul 25 11:59:52 2012 -0400 Minor diff --git a/src/hb-unicode-private.hh b/src/hb-unicode-private.hh index 0ba2fcc..1ce5adc 100644 --- a/src/hb-unicode-private.hh +++ b/src/hb-unicode-private.hh @@ -151,7 +151,7 @@ _hb_unicode_is_zero_width (hb_codepoint_t ch) return ((ch ~0x007F) == 0x2000 (hb_in_rangeshb_codepoint_t (ch, 0x200B, 0x200F, 0x202A, 0x202E, -0x2060, 0x2063) || +0x2060, 0x2064) || (ch == 0x2028))) || unlikely (ch == 0x0009 || ch == 0x00AD || ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Multiple substitution and mark positioning
This also happens with Arabic Typesetting I assume? b On 06/12/2012 06:31 AM, Khaled Hosny wrote: I’m not sure if this is related, but I now get no mkmk positioning when the marks are “inserted” using multiple substitution. For example, “للّٰه” is positioned correctly, while “لله” is not though it is the same mark glyphs except they are being added by multiple substation. [uni0647.fina_Lellah=4+721|uni0670=1@-267,-162|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319] vs.: [uni0647.fina_Lellah=2+721|uni0670=1@-245,-440|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319] Regards, Khaled On Fri, Jun 08, 2012 at 10:14:19PM -0400, Behdad Esfahbod wrote: Hi Khaled and others, I fixed this, among other things, including a major mlig and mkmk regression. Please test. behdad On 05/12/2012 08:54 AM, Khaled Hosny wrote: Hi all, There seems to be a difference between HarfBuzz and Uniscribe on how to handle mark positioning when there is multiple glyph substitution, namely HB seems to apply the mark to the last component while USP applies it to the first component. In other words, if there is base → base₁base₂ substitution, the sequence basemark will be rendered as if it was base₁base₂mark with HB, but as base₁markbase₂ with USP. Using hb-shape with “uniscribe” shaper, and the word “سَتا” and Arabic Typesetting font, I get [uniFE8E=3+343|uniFE98=2+376|uni064E=0@501,-260|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840] but with “ot” shaper, I get: [uniFE8E=3+343|uniFE98=2+376|uni064E=0@-11,-310|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840] though the glyph string is the same, the position of the mark is clearly different. (background: I need this to contextually insert tatweel to avoid mark collision in “crowded” places, but with the difference between both engines this can’t be reliably done without breaking mark positioning in one of them). Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Harfbuzz Sinhala (si) script support status update
On Mon, 2012-07-23 at 02:36 +1000, Harshula wrote: Hi Behdad and Jonathan, 1) I did a quick test of the latest commits. Basic Sinhala shaping seems to have improved for Bhashitha font (IIRC, the original version was for Windows) and gone backwards with GNU Free Font and LKLUG font. The following file contains strings that represent the minimal shaping support required: http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.txt This is how the output should look like: http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.png FreeSerif font: http://ftp.gnu.org/gnu/freefont/freefont-ttf-20120503.zip LKLUG font: http://sinhala.sourceforge.net/files/lklug.qa.ttf Both Pango and ICU are able to shape the content of icu-sinhala-rendering.txt correctly using either FreeSerif or LKLUG fonts. Here are more details about the problem. The new shaper renders කො (ko) incorrectly with FreeSerif and LKLUG fonts but renders correctly with Bhashitha font (IIRC, originated from Windows). The old shaper renders the string correctly using all three fonts. String: කො Unicode Sequence: U+0D9A,U+0DDC (consonant + split dependent vowel) U+0DDC = U+0DD9U+0DCF Using new shaper Bhashitha: [uni0DD9=0+1091|uni0D9A=0+1744|uni0DCF=0+629] FreeSerif: [e2_sinh=0+707|ka_sinh=0+915|o2_sinh=0+1308] LKLUG: [uni0DD9=0+727|uni0D9A=0+913|uni0DDC=0+1329] Using old shaper Bhashitha: [uni0DD9=0+1091|uni0D9A=0+1744|uni0DCF=0+629] FreeSerif: [e2_sinh=0+707|ka_sinh=0+915|aa2_sinh=0+336] LKLUG: [uni0DD9=0+727|uni0D9A=0+913|uni0DCF=0+356] NOTE: It appears LKLUG (using 'liga') and FreeSerif (using multiple subs) construct U+0DDC from U+0DD9 and U+0DCF. However, Bhashitha appears to deconstruct U+0DDC to form U+0DCF. I'm not good with font rule construction, so it would be advisable for you to inspect the font for accurate details. Thanks again for adding the old shaper!!! cya, # ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Harfbuzz on Windows
On 07/25/2012 09:31 AM, Samiullah Khawaja wrote: Hi Behdad, I want to build harfbuzz for windows... What replacement of unicode functions should I use? Is there any native support in windows that I can use to provide the unicode functions. ICU is (more or less) the recommended choice. However, I actually had less trouble getting glib to work on Windows, but that might be fault of mingw. Unfortunately both ICU and glib are huge frameworks with their own set of dependencies. A stripped-down standalone library that provides a unicode character database and nothing more would be great to have. I implemented HarfBuzz support in libass, but projects like Aegisub or VLC still ship without HarfBuzz on Windows because the ICU/glib requirement is such a pain. :( Best regards Grigori ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Harfbuzz on Windows
On 07/25/2012 04:58 PM, Grigori Goronzy wrote: Unfortunately both ICU and glib are huge frameworks with their own set of dependencies. A stripped-down standalone library that provides a unicode character database and nothing more would be great to have. I'm planning on shipping a barebone set of Unicode data internally in HarfBuzz for such usecases. No concrete timeline right now, but it's an afternoon's worth of hacking. behdad ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Multiple substitution and mark positioning
This only applies to the marks that result of multiple substitution i.e. in Amiri the middle lam of لله is substituted with lamshaddasmallalef, I don’t think Arabic Typesetting has something like that. On Wed, Jul 25, 2012 at 01:58:08PM -0400, Behdad Esfahbod wrote: This also happens with Arabic Typesetting I assume? b On 06/12/2012 06:31 AM, Khaled Hosny wrote: I’m not sure if this is related, but I now get no mkmk positioning when the marks are “inserted” using multiple substitution. For example, “للّٰه” is positioned correctly, while “لله” is not though it is the same mark glyphs except they are being added by multiple substation. [uni0647.fina_Lellah=4+721|uni0670=1@-267,-162|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319] vs.: [uni0647.fina_Lellah=2+721|uni0670=1@-245,-440|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319] Regards, Khaled On Fri, Jun 08, 2012 at 10:14:19PM -0400, Behdad Esfahbod wrote: Hi Khaled and others, I fixed this, among other things, including a major mlig and mkmk regression. Please test. behdad On 05/12/2012 08:54 AM, Khaled Hosny wrote: Hi all, There seems to be a difference between HarfBuzz and Uniscribe on how to handle mark positioning when there is multiple glyph substitution, namely HB seems to apply the mark to the last component while USP applies it to the first component. In other words, if there is base → base₁base₂ substitution, the sequence basemark will be rendered as if it was base₁base₂mark with HB, but as base₁markbase₂ with USP. Using hb-shape with “uniscribe” shaper, and the word “سَتا” and Arabic Typesetting font, I get [uniFE8E=3+343|uniFE98=2+376|uni064E=0@501,-260|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840] but with “ot” shaper, I get: [uniFE8E=3+343|uniFE98=2+376|uni064E=0@-11,-310|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840] though the glyph string is the same, the position of the mark is clearly different. (background: I need this to contextually insert tatweel to avoid mark collision in “crowded” places, but with the difference between both engines this can’t be reliably done without breaking mark positioning in one of them). Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Multiple substitution and mark positioning
On 25/7/12 22:13, Khaled Hosny wrote: This only applies to the marks that result of multiple substitution i.e. in Amiri the middle lam of لله is substituted with lamshaddasmallalef, I don’t think Arabic Typesetting has something like that. Just wondering (without checking the code yet...) - is it possible that we're failing to set the glyph category from GDEF properly for marks that are inserted by GSUB rules like this, and as a result the GPOS lookups don't match as expected? ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Multiple substitution and mark positioning
On 07/25/2012 06:14 PM, Jonathan Kew wrote: On 25/7/12 22:13, Khaled Hosny wrote: This only applies to the marks that result of multiple substitution i.e. in Amiri the middle lam of لله is substituted with lamshaddasmallalef, I don’t think Arabic Typesetting has something like that. Just wondering (without checking the code yet...) - is it possible that we're failing to set the glyph category from GDEF properly for marks that are inserted by GSUB rules like this, and as a result the GPOS lookups don't match as expected? No, I think I know where the problem is. It's somewhere in the lig_id / lig_comp matching code... Lemme see. b ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] harfbuzz-ng: Branch 'master'
src/hb-ot-layout-gpos-table.hh | 10 +- src/hb-ot-layout-private.hh| 23 +-- 2 files changed, 26 insertions(+), 7 deletions(-) New commits: commit a3313e54008167e415b72c780ca7b9cda958d07e Author: Behdad Esfahbod beh...@behdad.org Date: Wed Jul 25 18:37:51 2012 -0400 [GPOS] Fix MarkMarkPos applied to results of MultipleSubst This was broken as a result of 7b84c536c10ab90ed96a033d88e9ad232d46c5b8. As Khaled reported, MarkMark positioning was broken with glyphs resulting from a MultipleSubst. Fixed. Test with the ALLAH character in Amiri. diff --git a/src/hb-ot-layout-gpos-table.hh b/src/hb-ot-layout-gpos-table.hh index e95aa3e..5b71407 100644 --- a/src/hb-ot-layout-gpos-table.hh +++ b/src/hb-ot-layout-gpos-table.hh @@ -1169,11 +1169,11 @@ struct MarkMarkPosFormat1 unsigned int j = skippy_iter.idx; /* Two marks match only if they belong to the same base, or same component - * of the same ligature. That is, the component numbers must match, and - * if those are non-zero, the ligid number should also match. */ -if ((get_lig_comp (c-buffer-info[j]) != get_lig_comp (c-buffer-cur())) || - (get_lig_comp (c-buffer-info[j]) 0 -get_lig_id (c-buffer-info[j]) != get_lig_id (c-buffer-cur( + * of the same ligature. That is, the lig_id numbers must match, and + * if those are non-zero, the lig_comp number should also match. */ +if ((get_lig_id (c-buffer-info[j]) != get_lig_id (c-buffer-cur())) || + (get_lig_id (c-buffer-info[j]) 0 +get_lig_comp (c-buffer-info[j]) != get_lig_comp (c-buffer-cur( return TRACE_RETURN (false); unsigned int mark2_index = (this+mark2Coverage) (c-buffer-info[j].codepoint); diff --git a/src/hb-ot-layout-private.hh b/src/hb-ot-layout-private.hh index 7a1c7e3..ba375aa 100644 --- a/src/hb-ot-layout-private.hh +++ b/src/hb-ot-layout-private.hh @@ -68,8 +68,27 @@ _hb_ot_layout_skip_mark (hb_face_t*face, * GSUB/GPOS */ -/* unique ligature id */ -/* component number in the ligature (0 = base) */ +/* lig_id / lig_comp + * + * When a ligature is formed: + * + * - The ligature glyph and any marks in between all get a unique lig_id, + * - The ligature glyph will get lig_comp = 0 + * - The marks get lig_comp 0, reflecting which component of the ligature + * they were applied to. + * - This is used in GPOS to attach marks to the right component of a ligature + * in MarkLigPos. + * + * When a multiple-substitution is done: + * + * - All resulting glyphs will have lig_id = 0, + * - The resulting glyphs will have lig_comp = 0, 1, 2, ... respectively. + * - This is used in GPOS to attack marks to the first component of a + * multiple substitution in MarkBasePos. + * + * The numbers are also used in GPOS to do mark-to-mark positioning only + * to marks that belong to the same component of a ligature in MarkMarPos. + */ static inline void set_lig_props (hb_glyph_info_t info, unsigned int lig_id, unsigned int lig_comp) { ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Multiple substitution and mark positioning
Fixed. On 07/25/2012 05:13 PM, Khaled Hosny wrote: This only applies to the marks that result of multiple substitution i.e. in Amiri the middle lam of لله is substituted with lamshaddasmallalef, I don’t think Arabic Typesetting has something like that. On Wed, Jul 25, 2012 at 01:58:08PM -0400, Behdad Esfahbod wrote: This also happens with Arabic Typesetting I assume? b On 06/12/2012 06:31 AM, Khaled Hosny wrote: I’m not sure if this is related, but I now get no mkmk positioning when the marks are “inserted” using multiple substitution. For example, “للّٰه” is positioned correctly, while “لله” is not though it is the same mark glyphs except they are being added by multiple substation. [uni0647.fina_Lellah=4+721|uni0670=1@-267,-162|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319] vs.: [uni0647.fina_Lellah=2+721|uni0670=1@-245,-440|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319] Regards, Khaled On Fri, Jun 08, 2012 at 10:14:19PM -0400, Behdad Esfahbod wrote: Hi Khaled and others, I fixed this, among other things, including a major mlig and mkmk regression. Please test. behdad On 05/12/2012 08:54 AM, Khaled Hosny wrote: Hi all, There seems to be a difference between HarfBuzz and Uniscribe on how to handle mark positioning when there is multiple glyph substitution, namely HB seems to apply the mark to the last component while USP applies it to the first component. In other words, if there is base → base₁base₂ substitution, the sequence basemark will be rendered as if it was base₁base₂mark with HB, but as base₁markbase₂ with USP. Using hb-shape with “uniscribe” shaper, and the word “سَتا” and Arabic Typesetting font, I get [uniFE8E=3+343|uniFE98=2+376|uni064E=0@501,-260|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840] but with “ot” shaper, I get: [uniFE8E=3+343|uniFE98=2+376|uni064E=0@-11,-310|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840] though the glyph string is the same, the position of the mark is clearly different. (background: I need this to contextually insert tatweel to avoid mark collision in “crowded” places, but with the difference between both engines this can’t be reliably done without breaking mark positioning in one of them). Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Multiple substitution and mark positioning
Thanks Behdad. On Wed, Jul 25, 2012 at 06:39:29PM -0400, Behdad Esfahbod wrote: Fixed. On 07/25/2012 05:13 PM, Khaled Hosny wrote: This only applies to the marks that result of multiple substitution i.e. in Amiri the middle lam of لله is substituted with lamshaddasmallalef, I don’t think Arabic Typesetting has something like that. On Wed, Jul 25, 2012 at 01:58:08PM -0400, Behdad Esfahbod wrote: This also happens with Arabic Typesetting I assume? b On 06/12/2012 06:31 AM, Khaled Hosny wrote: I’m not sure if this is related, but I now get no mkmk positioning when the marks are “inserted” using multiple substitution. For example, “للّٰه” is positioned correctly, while “لله” is not though it is the same mark glyphs except they are being added by multiple substation. [uni0647.fina_Lellah=4+721|uni0670=1@-267,-162|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319] vs.: [uni0647.fina_Lellah=2+721|uni0670=1@-245,-440|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319] Regards, Khaled On Fri, Jun 08, 2012 at 10:14:19PM -0400, Behdad Esfahbod wrote: Hi Khaled and others, I fixed this, among other things, including a major mlig and mkmk regression. Please test. behdad On 05/12/2012 08:54 AM, Khaled Hosny wrote: Hi all, There seems to be a difference between HarfBuzz and Uniscribe on how to handle mark positioning when there is multiple glyph substitution, namely HB seems to apply the mark to the last component while USP applies it to the first component. In other words, if there is base → base₁base₂ substitution, the sequence basemark will be rendered as if it was base₁base₂mark with HB, but as base₁markbase₂ with USP. Using hb-shape with “uniscribe” shaper, and the word “سَتا” and Arabic Typesetting font, I get [uniFE8E=3+343|uniFE98=2+376|uni064E=0@501,-260|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840] but with “ot” shaper, I get: [uniFE8E=3+343|uniFE98=2+376|uni064E=0@-11,-310|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840] though the glyph string is the same, the position of the mark is clearly different. (background: I need this to contextually insert tatweel to avoid mark collision in “crowded” places, but with the difference between both engines this can’t be reliably done without breaking mark positioning in one of them). Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Mapping output glyphs back to input character
On Sun, Jul 22, 2012 at 11:37:23PM -0400, Behdad Esfahbod wrote: Hi Khaled, On 07/21/2012 05:49 AM, Khaled Hosny wrote: How do I map output glyphs back to input characters? I assume I've to use clusters for that, but I can't make much sense of the cluster numbers I'm seeing and don't seem to find any explanation for them. When you add text to a hb_buffer_t, you set a cluster number for each character. The functions hb_buffer_add_utf* implicitly use the index into the input string for the cluster. Ie. when using the UTF-8 version, UTF-8 indices are used. Note that hb-view/hb-shape by default use UTF-32 cluster numbers (ie. character-count instead of byte-count). You can change that using --utf8-clusters. I’m using UTF-16 (playing with porting LibreOffice to HarfBuzz), so how surrogate pairs are handled? The shaping process implicitly segments the input text + output glyphs in a series of clusters. So you can think of, for LTR text, first cluster followed by second cluster, followed by third cluster, etc, where each cluster contains a number of characters and a number of glyphs. Now, the hb_glyph_info_t::cluster member after shaping simply points to the minimum value of that member for all the characters that belong to the cluster. For RTL it's similar, though in reverse direction. Quick example. If you add text for differ, then initially characters get cluster values 0,1,2,3,4,5 respectively. After shaping, if the 'ff' ligature was formed, you will get five glyphs, with cluster values 0,1,2,4,5. This means that the two characters that originally had cluster values 2 and 3 are represented by the sole glyph having the cluster value 2. Hope that helps. Thanks Behdad, this was very helpful. Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Harfbuzz on Windows
Hi, What about uniscribe? Will it be able to replace the icu/glib dependency. I thought I can get rid of icu/glib dependency by using uniscribe. Is my assumption wrong? What does hb-uniscribe.cc file in harfbuzz-ng source do? Are there any Windows APIs I can use and write a layer on it to replace the icu/glib dependency? Thanks for the help. Sami On Thu, Jul 26, 2012 at 2:11 AM, Behdad Esfahbod beh...@behdad.org wrote: On 07/25/2012 04:58 PM, Grigori Goronzy wrote: Unfortunately both ICU and glib are huge frameworks with their own set of dependencies. A stripped-down standalone library that provides a unicode character database and nothing more would be great to have. I'm planning on shipping a barebone set of Unicode data internally in HarfBuzz for such usecases. No concrete timeline right now, but it's an afternoon's worth of hacking. behdad ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz -- Samiullah Khawaja Software Engineer email: sami.khaw...@gmail.com voice: +(92) 0321-4184324 ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
[HarfBuzz] harfbuzz-ng: Branch 'master'
src/hb-old.cc| 51 +-- src/hb-old/harfbuzz-shaper.h |1 src/hb-uniscribe.cc |5 ++-- 3 files changed, 44 insertions(+), 13 deletions(-) New commits: commit 91e721ea8693205f4f738bca97a5055ee75cf463 Author: Behdad Esfahbod beh...@behdad.org Date: Wed Jul 25 19:20:34 2012 -0400 [hb-old] Fix clusters Unlike its documentation, hb-old's log_clusters are, well, indeed logical, not visual. Fixup. Adapted / copied from hb-uniscribe. diff --git a/src/hb-old.cc b/src/hb-old.cc index a3b5679..e828ca8 100644 --- a/src/hb-old.cc +++ b/src/hb-old.cc @@ -251,6 +251,8 @@ _hb_old_shape (hb_font_t *font, buffer-guess_properties (); + bool backward = HB_DIRECTION_IS_BACKWARD (buffer-props.direction); + #define FAIL(...) \ HB_STMT_START { \ DEBUG_MSG (OLD, NULL, __VA_ARGS__); \ @@ -285,23 +287,23 @@ retry: pchars[chars_len++] = 0xDC00 + ((c - 0x1) ((1 10) - 1)); } } -#undef utf16_index #define ALLOCATE_ARRAY(Type, name, len) \ name = (Type *) scratch; \ - scratch += len * sizeof (name[0]); \ - scratch_size -= len * sizeof (name[0]); + scratch += (len) * sizeof ((name)[0]); \ + scratch_size -= (len) * sizeof ((name)[0]); HB_ShaperItem item = {0}; ALLOCATE_ARRAY (const HB_UChar16, item.string, chars_len); + ALLOCATE_ARRAY (unsigned short, item.log_clusters, chars_len + 2); item.stringLength = chars_len; item.item.pos = 0; item.item.length = item.stringLength; item.item.script = hb_old_script_from_script (buffer-props.script); - item.item.bidiLevel = HB_DIRECTION_IS_FORWARD (buffer-props.direction) ? 0 : 1; + item.item.bidiLevel = backward ? 1 : 0; item.font = old_font; item.face = old_face; @@ -314,14 +316,17 @@ retry: sizeof (HB_GlyphAttributes) + sizeof (HB_Fixed) + sizeof (HB_FixedPoint) + - sizeof (unsigned short)); + sizeof (uint32_t)); item.num_glyphs = num_glyphs; ALLOCATE_ARRAY (HB_Glyph, item.glyphs, num_glyphs); ALLOCATE_ARRAY (HB_GlyphAttributes, item.attributes, num_glyphs); ALLOCATE_ARRAY (HB_Fixed, item.advances, num_glyphs); ALLOCATE_ARRAY (HB_FixedPoint, item.offsets, num_glyphs); - ALLOCATE_ARRAY (unsigned short, item.log_clusters, num_glyphs); + uint32_t *vis_clusters; + ALLOCATE_ARRAY (uint32_t, vis_clusters, num_glyphs); + +#undef ALLOCATE_ARRAY if (!HB_ShapeItem (item)) return false; @@ -335,24 +340,48 @@ retry: } num_glyphs = item.num_glyphs; -#undef ALLOCATE_ARRAY + /* Ok, we've got everything we need, now compose output buffer, + * very, *very*, carefully! */ + + /* Calculate visual-clusters. That's what we ship. */ + for (unsigned int i = 0; i num_glyphs; i++) +vis_clusters[i] = -1; + for (unsigned int i = 0; i buffer-len; i++) { +uint32_t *p = vis_clusters[item.log_clusters[buffer-info[i].utf16_index()]]; +*p = MIN (*p, buffer-info[i].cluster); + } + if (!backward) { +for (unsigned int i = 1; i num_glyphs; i++) + if (vis_clusters[i] == -1) + vis_clusters[i] = vis_clusters[i - 1]; + } else { +for (int i = num_glyphs - 2; i = 0; i--) + if (vis_clusters[i] == -1) + vis_clusters[i] = vis_clusters[i + 1]; + } + +#undef utf16_index + buffer-ensure (num_glyphs); + if (buffer-in_error) +FAIL (Buffer in error); + + + buffer-len = num_glyphs; hb_glyph_info_t *info = buffer-info; for (unsigned int i = 0; i num_glyphs; i++) { info[i].codepoint = item.glyphs[i]; -info[i].cluster = item.log_clusters[i]; +info[i].cluster = vis_clusters[i]; info[i].mask = item.advances[i]; info[i].var1.u32 = item.offsets[i].x; info[i].var2.u32 = item.offsets[i].y; } - buffer-len = num_glyphs; buffer-clear_positions (); - unsigned int count = buffer-len; - for (unsigned int i = 0; i count; ++i) { + for (unsigned int i = 0; i num_glyphs; ++i) { hb_glyph_info_t *info = buffer-info[i]; hb_glyph_position_t *pos = buffer-pos[i]; diff --git a/src/hb-old/harfbuzz-shaper.h b/src/hb-old/harfbuzz-shaper.h index 3f32d47..ab65004 100644 --- a/src/hb-old/harfbuzz-shaper.h +++ b/src/hb-old/harfbuzz-shaper.h @@ -251,6 +251,7 @@ struct HB_ShaperItem_ { HB_Fixed *advances; /* output: num_glyphs advances */ HB_FixedPoint *offsets; /* output: num_glyphs offsets */ unsigned short *log_clusters; /* output: for each output glyph, the index in the input of the start of its logical cluster */ +/* XXX the discription for log_clusters is wrong. It maps each input position to output glyph position! */ /* internal */ HB_Bool kerning_applied;/* output: true if kerning was applied by the shaper */
[HarfBuzz] harfbuzz-ng: Branch 'master' - 2 commits
src/hb-old.cc | 15 --- src/hb-old/harfbuzz-shaper.cpp |2 +- 2 files changed, 9 insertions(+), 8 deletions(-) New commits: commit 2e7f223054d310695bdb3498b2b2b5d17b6cce78 Author: Behdad Esfahbod beh...@behdad.org Date: Wed Jul 25 19:30:15 2012 -0400 [hb-old] Fix Arabic cursive positioning Backporting from upstream: commit b847f24ce855d24f6822bcd9c0006905e81b94d8 Author: Behdad Esfahbod beh...@behdad.org Date: Wed Jul 25 19:29:16 2012 -0400 [arabic] Fix Arabic cursive positioning This was clearly broken in testing. Who knows... Fixes for me. Test with a Nastaleeq font, or with Arabic Typesetting. Backporting from Chromium. diff --git a/src/hb-old/harfbuzz-shaper.cpp b/src/hb-old/harfbuzz-shaper.cpp index 5baf971..62886f3 100644 --- a/src/hb-old/harfbuzz-shaper.cpp +++ b/src/hb-old/harfbuzz-shaper.cpp @@ -923,7 +923,7 @@ HB_Bool HB_OpenTypePosition(HB_ShaperItem *item, int availableGlyphs, HB_Bool do adjustment = HB_FIXED_ROUND(adjustment); if (positions[i].new_advance) { -advances[i] = adjustment; +; //advances[i] = adjustment; } else { advances[i] += adjustment; } commit 9550a8c4e8b4e28be60d38c27d59253846ff9569 Author: Behdad Esfahbod beh...@behdad.org Date: Wed Jul 25 19:22:57 2012 -0400 [hb-old] Fixup not-enough-space handling diff --git a/src/hb-old.cc b/src/hb-old.cc index e828ca8..be0187f 100644 --- a/src/hb-old.cc +++ b/src/hb-old.cc @@ -329,14 +329,15 @@ retry: #undef ALLOCATE_ARRAY if (!HB_ShapeItem (item)) -return false; - - if (unlikely (item.num_glyphs num_glyphs)) { -buffer-ensure (buffer-allocated * 2); -if (buffer-in_error) - FAIL (Buffer resize failed); -goto retry; +if (unlikely (item.num_glyphs num_glyphs)) +{ + buffer-ensure (buffer-allocated * 2); + if (buffer-in_error) + FAIL (Buffer resize failed); + goto retry; +} +return false; } num_glyphs = item.num_glyphs; ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Mapping output glyphs back to input character
On 07/25/2012 07:17 PM, Khaled Hosny wrote: On Sun, Jul 22, 2012 at 11:37:23PM -0400, Behdad Esfahbod wrote: Hi Khaled, On 07/21/2012 05:49 AM, Khaled Hosny wrote: How do I map output glyphs back to input characters? I assume I've to use clusters for that, but I can't make much sense of the cluster numbers I'm seeing and don't seem to find any explanation for them. When you add text to a hb_buffer_t, you set a cluster number for each character. The functions hb_buffer_add_utf* implicitly use the index into the input string for the cluster. Ie. when using the UTF-8 version, UTF-8 indices are used. Note that hb-view/hb-shape by default use UTF-32 cluster numbers (ie. character-count instead of byte-count). You can change that using --utf8-clusters. I’m using UTF-16 (playing with porting LibreOffice to HarfBuzz), so how surrogate pairs are handled? See bottom of hb-buffer.cc. cluster values after shaping hook back to UTF-16 index in the original. If you want to be more impactful, don't port LibreOffice, port iculayout! It's probably 400 lines of code... behdad The shaping process implicitly segments the input text + output glyphs in a series of clusters. So you can think of, for LTR text, first cluster followed by second cluster, followed by third cluster, etc, where each cluster contains a number of characters and a number of glyphs. Now, the hb_glyph_info_t::cluster member after shaping simply points to the minimum value of that member for all the characters that belong to the cluster. For RTL it's similar, though in reverse direction. Quick example. If you add text for differ, then initially characters get cluster values 0,1,2,3,4,5 respectively. After shaping, if the 'ff' ligature was formed, you will get five glyphs, with cluster values 0,1,2,4,5. This means that the two characters that originally had cluster values 2 and 3 are represented by the sole glyph having the cluster value 2. Hope that helps. Thanks Behdad, this was very helpful. Regards, Khaled ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Re: [HarfBuzz] Harfbuzz on Windows
On 07/25/2012 07:18 PM, Samiullah Khawaja wrote: Hi, What about uniscribe? Will it be able to replace the icu/glib dependency. I thought I can get rid of icu/glib dependency by using uniscribe. Is my assumption wrong? What does hb-uniscribe.cc file in harfbuzz-ng source do? hb-uniscribe.cc is a backend delegating the whole shaping to the Windows DLL. With that, you don't need any Unicode callbacks whatsoever. It's not meant for production use, but it's actually not that bad. Depends on what your goals really are. Are there any Windows APIs I can use and write a layer on it to replace the icu/glib dependency? Not that I know of. I'll produce something tonight. behdad Thanks for the help. Sami On Thu, Jul 26, 2012 at 2:11 AM, Behdad Esfahbod beh...@behdad.org mailto:beh...@behdad.org wrote: On 07/25/2012 04:58 PM, Grigori Goronzy wrote: Unfortunately both ICU and glib are huge frameworks with their own set of dependencies. A stripped-down standalone library that provides a unicode character database and nothing more would be great to have. I'm planning on shipping a barebone set of Unicode data internally in HarfBuzz for such usecases. No concrete timeline right now, but it's an afternoon's worth of hacking. behdad ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org mailto:HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz -- Samiullah Khawaja Software Engineer email: sami.khaw...@gmail.com mailto:sami.khaw...@gmail.com voice: +(92) 0321-4184324 ___ HarfBuzz mailing list HarfBuzz@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/harfbuzz