[HarfBuzz] Harfbuzz on Windows

2012-07-25 Thread Samiullah Khawaja
Hi Behdad,

I want to build harfbuzz for windows... What replacement of unicode
functions should I use? Is there any native support in windows that I can
use to provide the unicode functions.

-- 
Samiullah Khawaja
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


[HarfBuzz] HarfBuzz.old backend in new HarfBuzz

2012-07-25 Thread Behdad Esfahbod
Hi,

Quick heads up that I just pushed a new backend in HarfBuzz that uses the
HarfBuzz.old shaper.  All code is in-tree, so no dependencies.  You can choose
the old shaper by its name old using --shaper or hb_shape_full().

Pretty much like the Uniscribe and CoreText backends, this new backend is
primarily for testing, and may be removed in the future (after I have
convinced everyone to move to the real HarfBuzz).

Coming soon...  iculayout backend perhaps.  And lets see if Jonathan will do a
DWrite backend any time soon.


Cheers,
behdad
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


[HarfBuzz] harfbuzz-ng: Branch 'master'

2012-07-25 Thread Behdad Esfahbod
 src/hb-unicode-private.hh |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

New commits:
commit 35bdab3cf1f0836807160e3ce93766c321b32e8c
Author: Behdad Esfahbod beh...@behdad.org
Date:   Wed Jul 25 11:59:52 2012 -0400

Minor

diff --git a/src/hb-unicode-private.hh b/src/hb-unicode-private.hh
index 0ba2fcc..1ce5adc 100644
--- a/src/hb-unicode-private.hh
+++ b/src/hb-unicode-private.hh
@@ -151,7 +151,7 @@ _hb_unicode_is_zero_width (hb_codepoint_t ch)
   return ((ch  ~0x007F) == 0x2000  (hb_in_rangeshb_codepoint_t (ch,
 0x200B, 
0x200F,
 0x202A, 
0x202E,
-0x2060, 
0x2063) ||
+0x2060, 
0x2064) ||
   (ch == 0x2028))) ||
  unlikely (ch == 0x0009 ||
ch == 0x00AD ||
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Multiple substitution and mark positioning

2012-07-25 Thread Behdad Esfahbod
This also happens with Arabic Typesetting I assume?

b

On 06/12/2012 06:31 AM, Khaled Hosny wrote:
 I’m not sure if this is related, but I now get no mkmk positioning when
 the marks are “inserted” using multiple substitution. For example, “للّٰه”
 is positioned correctly, while “لله” is not though it is the same mark
 glyphs except they are being added by multiple substation.
 
 [uni0647.fina_Lellah=4+721|uni0670=1@-267,-162|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319]
 
 vs.:
 
 [uni0647.fina_Lellah=2+721|uni0670=1@-245,-440|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319]
 
 Regards,
  Khaled
 
 On Fri, Jun 08, 2012 at 10:14:19PM -0400, Behdad Esfahbod wrote:
 Hi Khaled and others,

 I fixed this, among other things, including a major mlig and mkmk regression.
  Please test.

 behdad

 On 05/12/2012 08:54 AM, Khaled Hosny wrote:
 Hi all,

 There seems to be a difference between HarfBuzz and Uniscribe on how to
 handle mark positioning when there is multiple glyph substitution,
 namely HB seems to apply the mark to the last component while USP
 applies it to the first component.

 In other words, if there is base → base₁base₂ substitution, the
 sequence basemark will be rendered as if it was base₁base₂mark
 with HB, but as base₁markbase₂ with USP.

 Using hb-shape with “uniscribe” shaper, and the word “سَتا” and Arabic
 Typesetting font, I get

   
 [uniFE8E=3+343|uniFE98=2+376|uni064E=0@501,-260|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840]
  
 but with “ot” shaper, I get:

   
 [uniFE8E=3+343|uniFE98=2+376|uni064E=0@-11,-310|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840]
  
 though the glyph string is the same, the position of the mark is clearly
 different.

 (background: I need this to contextually insert tatweel to avoid mark
 collision in “crowded” places, but with the difference between both
 engines this can’t be reliably done without breaking mark positioning in
 one of them).

 Regards,
  Khaled
 ___
 HarfBuzz mailing list
 HarfBuzz@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/harfbuzz
 
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Harfbuzz Sinhala (si) script support status update

2012-07-25 Thread Harshula
On Mon, 2012-07-23 at 02:36 +1000, Harshula wrote:
 Hi Behdad and Jonathan,
 
 1) I did a quick test of the latest commits. Basic Sinhala shaping seems
 to have improved for Bhashitha font (IIRC, the original version was for
 Windows) and gone backwards with GNU Free Font and LKLUG font.
 
 The following file contains strings that represent the minimal shaping
 support required:
 http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.txt
 
 This is how the output should look like:
 http://git.savannah.gnu.org/cgit/sinhala.git/plain/patches/icu-sinhala-rendering.png
 
 FreeSerif font:
 http://ftp.gnu.org/gnu/freefont/freefont-ttf-20120503.zip
 
 LKLUG font:
 http://sinhala.sourceforge.net/files/lklug.qa.ttf
 
 Both Pango and ICU are able to shape the content of
 icu-sinhala-rendering.txt correctly using either FreeSerif or LKLUG
 fonts.

Here are more details about the problem. The new shaper renders කො
(ko) incorrectly with FreeSerif and LKLUG fonts but renders correctly
with Bhashitha font (IIRC, originated from Windows). The old shaper
renders the string correctly using all three fonts.

String: කො
Unicode Sequence: U+0D9A,U+0DDC (consonant + split dependent vowel)

U+0DDC = U+0DD9U+0DCF

Using new shaper


Bhashitha:
 [uni0DD9=0+1091|uni0D9A=0+1744|uni0DCF=0+629]

FreeSerif:
 [e2_sinh=0+707|ka_sinh=0+915|o2_sinh=0+1308]

LKLUG:
 [uni0DD9=0+727|uni0D9A=0+913|uni0DDC=0+1329]

Using old shaper


Bhashitha:
 [uni0DD9=0+1091|uni0D9A=0+1744|uni0DCF=0+629]

FreeSerif:
 [e2_sinh=0+707|ka_sinh=0+915|aa2_sinh=0+336]

LKLUG:
 [uni0DD9=0+727|uni0D9A=0+913|uni0DCF=0+356]


NOTE: It appears LKLUG (using 'liga') and FreeSerif (using multiple
subs) construct U+0DDC from U+0DD9 and U+0DCF. However, Bhashitha
appears to deconstruct U+0DDC to form U+0DCF. I'm not good with font
rule construction, so it would be advisable for you to inspect the font
for accurate details.

Thanks again for adding the old shaper!!!

cya,
#

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Harfbuzz on Windows

2012-07-25 Thread Grigori Goronzy
On 07/25/2012 09:31 AM, Samiullah Khawaja wrote:
 Hi Behdad,
 
 I want to build harfbuzz for windows... What replacement of unicode
 functions should I use? Is there any native support in windows that I
 can use to provide the unicode functions.
 

ICU is (more or less) the recommended choice. However, I actually had
less trouble getting glib to work on Windows, but that might be fault of
mingw.

Unfortunately both ICU and glib are huge frameworks with their own set
of dependencies. A stripped-down standalone library that provides a
unicode character database and nothing more would be great to have.

I implemented HarfBuzz support in libass, but projects like Aegisub or
VLC still ship without HarfBuzz on Windows because the ICU/glib
requirement is such a pain. :(

Best regards
Grigori

___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Harfbuzz on Windows

2012-07-25 Thread Behdad Esfahbod
On 07/25/2012 04:58 PM, Grigori Goronzy wrote:
 Unfortunately both ICU and glib are huge frameworks with their own set
 of dependencies. A stripped-down standalone library that provides a
 unicode character database and nothing more would be great to have.

I'm planning on shipping a barebone set of Unicode data internally in HarfBuzz
for such usecases.  No concrete timeline right now, but it's an afternoon's
worth of hacking.

behdad
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Multiple substitution and mark positioning

2012-07-25 Thread Khaled Hosny
This only applies to the marks that result of multiple substitution i.e.
in Amiri the middle lam of لله is substituted with
lamshaddasmallalef, I don’t think Arabic Typesetting has something
like that.

On Wed, Jul 25, 2012 at 01:58:08PM -0400, Behdad Esfahbod wrote:
 This also happens with Arabic Typesetting I assume?
 
 b
 
 On 06/12/2012 06:31 AM, Khaled Hosny wrote:
  I’m not sure if this is related, but I now get no mkmk positioning when
  the marks are “inserted” using multiple substitution. For example, “للّٰه”
  is positioned correctly, while “لله” is not though it is the same mark
  glyphs except they are being added by multiple substation.
  
  [uni0647.fina_Lellah=4+721|uni0670=1@-267,-162|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319]
  
  vs.:
  
  [uni0647.fina_Lellah=2+721|uni0670=1@-245,-440|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319]
  
  Regards,
   Khaled
  
  On Fri, Jun 08, 2012 at 10:14:19PM -0400, Behdad Esfahbod wrote:
  Hi Khaled and others,
 
  I fixed this, among other things, including a major mlig and mkmk 
  regression.
   Please test.
 
  behdad
 
  On 05/12/2012 08:54 AM, Khaled Hosny wrote:
  Hi all,
 
  There seems to be a difference between HarfBuzz and Uniscribe on how to
  handle mark positioning when there is multiple glyph substitution,
  namely HB seems to apply the mark to the last component while USP
  applies it to the first component.
 
  In other words, if there is base → base₁base₂ substitution, the
  sequence basemark will be rendered as if it was base₁base₂mark
  with HB, but as base₁markbase₂ with USP.
 
  Using hb-shape with “uniscribe” shaper, and the word “سَتا” and Arabic
  Typesetting font, I get
 

  [uniFE8E=3+343|uniFE98=2+376|uni064E=0@501,-260|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840]
   
  but with “ot” shaper, I get:
 

  [uniFE8E=3+343|uniFE98=2+376|uni064E=0@-11,-310|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840]
   
  though the glyph string is the same, the position of the mark is clearly
  different.
 
  (background: I need this to contextually insert tatweel to avoid mark
  collision in “crowded” places, but with the difference between both
  engines this can’t be reliably done without breaking mark positioning in
  one of them).
 
  Regards,
   Khaled
  ___
  HarfBuzz mailing list
  HarfBuzz@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/harfbuzz
  
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Multiple substitution and mark positioning

2012-07-25 Thread Jonathan Kew

On 25/7/12 22:13, Khaled Hosny wrote:

This only applies to the marks that result of multiple substitution i.e.
in Amiri the middle lam of لله is substituted with
lamshaddasmallalef, I don’t think Arabic Typesetting has something
like that.



Just wondering (without checking the code yet...) - is it possible that 
we're failing to set the glyph category from GDEF properly for marks 
that are inserted by GSUB rules like this, and as a result the GPOS 
lookups don't match as expected?




___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Multiple substitution and mark positioning

2012-07-25 Thread Behdad Esfahbod
On 07/25/2012 06:14 PM, Jonathan Kew wrote:
 On 25/7/12 22:13, Khaled Hosny wrote:
 This only applies to the marks that result of multiple substitution i.e.
 in Amiri the middle lam of لله is substituted with
 lamshaddasmallalef, I don’t think Arabic Typesetting has something
 like that.

 
 Just wondering (without checking the code yet...) - is it possible that we're
 failing to set the glyph category from GDEF properly for marks that are
 inserted by GSUB rules like this, and as a result the GPOS lookups don't match
 as expected?

No, I think I know where the problem is.  It's somewhere in the lig_id /
lig_comp matching code...  Lemme see.

b
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


[HarfBuzz] harfbuzz-ng: Branch 'master'

2012-07-25 Thread Behdad Esfahbod
 src/hb-ot-layout-gpos-table.hh |   10 +-
 src/hb-ot-layout-private.hh|   23 +--
 2 files changed, 26 insertions(+), 7 deletions(-)

New commits:
commit a3313e54008167e415b72c780ca7b9cda958d07e
Author: Behdad Esfahbod beh...@behdad.org
Date:   Wed Jul 25 18:37:51 2012 -0400

[GPOS] Fix MarkMarkPos applied to results of MultipleSubst

This was broken as a result of 7b84c536c10ab90ed96a033d88e9ad232d46c5b8.
As Khaled reported, MarkMark positioning was broken with glyphs
resulting from a MultipleSubst.  Fixed.  Test with the ALLAH character
in Amiri.

diff --git a/src/hb-ot-layout-gpos-table.hh b/src/hb-ot-layout-gpos-table.hh
index e95aa3e..5b71407 100644
--- a/src/hb-ot-layout-gpos-table.hh
+++ b/src/hb-ot-layout-gpos-table.hh
@@ -1169,11 +1169,11 @@ struct MarkMarkPosFormat1
 unsigned int j = skippy_iter.idx;
 
 /* Two marks match only if they belong to the same base, or same component
- * of the same ligature.  That is, the component numbers must match, and
- * if those are non-zero, the ligid number should also match. */
-if ((get_lig_comp (c-buffer-info[j]) != get_lig_comp (c-buffer-cur())) 
||
-   (get_lig_comp (c-buffer-info[j])  0 
-get_lig_id (c-buffer-info[j]) != get_lig_id (c-buffer-cur(
+ * of the same ligature.  That is, the lig_id numbers must match, and
+ * if those are non-zero, the lig_comp number should also match. */
+if ((get_lig_id (c-buffer-info[j]) != get_lig_id (c-buffer-cur())) ||
+   (get_lig_id (c-buffer-info[j])  0 
+get_lig_comp (c-buffer-info[j]) != get_lig_comp (c-buffer-cur(
   return TRACE_RETURN (false);
 
 unsigned int mark2_index = (this+mark2Coverage) 
(c-buffer-info[j].codepoint);
diff --git a/src/hb-ot-layout-private.hh b/src/hb-ot-layout-private.hh
index 7a1c7e3..ba375aa 100644
--- a/src/hb-ot-layout-private.hh
+++ b/src/hb-ot-layout-private.hh
@@ -68,8 +68,27 @@ _hb_ot_layout_skip_mark (hb_face_t*face,
  * GSUB/GPOS
  */
 
-/* unique ligature id */
-/* component number in the ligature (0 = base) */
+/* lig_id / lig_comp
+ *
+ * When a ligature is formed:
+ *
+ *   - The ligature glyph and any marks in between all get a unique lig_id,
+ *   - The ligature glyph will get lig_comp = 0
+ *   - The marks get lig_comp  0, reflecting which component of the ligature
+ * they were applied to.
+ *   - This is used in GPOS to attach marks to the right component of a 
ligature
+ * in MarkLigPos.
+ *
+ * When a multiple-substitution is done:
+ *
+ *   - All resulting glyphs will have lig_id = 0,
+ *   - The resulting glyphs will have lig_comp = 0, 1, 2, ... respectively.
+ *   - This is used in GPOS to attack marks to the first component of a
+ * multiple substitution in MarkBasePos.
+ *
+ * The numbers are also used in GPOS to do mark-to-mark positioning only
+ * to marks that belong to the same component of a ligature in MarkMarPos.
+ */
 static inline void
 set_lig_props (hb_glyph_info_t info, unsigned int lig_id, unsigned int 
lig_comp)
 {
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Multiple substitution and mark positioning

2012-07-25 Thread Behdad Esfahbod
Fixed.

On 07/25/2012 05:13 PM, Khaled Hosny wrote:
 This only applies to the marks that result of multiple substitution i.e.
 in Amiri the middle lam of لله is substituted with
 lamshaddasmallalef, I don’t think Arabic Typesetting has something
 like that.
 
 On Wed, Jul 25, 2012 at 01:58:08PM -0400, Behdad Esfahbod wrote:
 This also happens with Arabic Typesetting I assume?

 b

 On 06/12/2012 06:31 AM, Khaled Hosny wrote:
 I’m not sure if this is related, but I now get no mkmk positioning when
 the marks are “inserted” using multiple substitution. For example, “للّٰه”
 is positioned correctly, while “لله” is not though it is the same mark
 glyphs except they are being added by multiple substation.

 [uni0647.fina_Lellah=4+721|uni0670=1@-267,-162|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319]

 vs.:

 [uni0647.fina_Lellah=2+721|uni0670=1@-245,-440|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319]

 Regards,
  Khaled

 On Fri, Jun 08, 2012 at 10:14:19PM -0400, Behdad Esfahbod wrote:
 Hi Khaled and others,

 I fixed this, among other things, including a major mlig and mkmk 
 regression.
  Please test.

 behdad

 On 05/12/2012 08:54 AM, Khaled Hosny wrote:
 Hi all,

 There seems to be a difference between HarfBuzz and Uniscribe on how to
 handle mark positioning when there is multiple glyph substitution,
 namely HB seems to apply the mark to the last component while USP
 applies it to the first component.

 In other words, if there is base → base₁base₂ substitution, the
 sequence basemark will be rendered as if it was base₁base₂mark
 with HB, but as base₁markbase₂ with USP.

 Using hb-shape with “uniscribe” shaper, and the word “سَتا” and Arabic
 Typesetting font, I get

   
 [uniFE8E=3+343|uniFE98=2+376|uni064E=0@501,-260|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840]
  
 but with “ot” shaper, I get:

   
 [uniFE8E=3+343|uniFE98=2+376|uni064E=0@-11,-310|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840]
  
 though the glyph string is the same, the position of the mark is clearly
 different.

 (background: I need this to contextually insert tatweel to avoid mark
 collision in “crowded” places, but with the difference between both
 engines this can’t be reliably done without breaking mark positioning in
 one of them).

 Regards,
  Khaled
 ___
 HarfBuzz mailing list
 HarfBuzz@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/harfbuzz

 
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Multiple substitution and mark positioning

2012-07-25 Thread Khaled Hosny
Thanks Behdad.

On Wed, Jul 25, 2012 at 06:39:29PM -0400, Behdad Esfahbod wrote:
 Fixed.
 
 On 07/25/2012 05:13 PM, Khaled Hosny wrote:
  This only applies to the marks that result of multiple substitution i.e.
  in Amiri the middle lam of لله is substituted with
  lamshaddasmallalef, I don’t think Arabic Typesetting has something
  like that.
  
  On Wed, Jul 25, 2012 at 01:58:08PM -0400, Behdad Esfahbod wrote:
  This also happens with Arabic Typesetting I assume?
 
  b
 
  On 06/12/2012 06:31 AM, Khaled Hosny wrote:
  I’m not sure if this is related, but I now get no mkmk positioning when
  the marks are “inserted” using multiple substitution. For example, “للّٰه”
  is positioned correctly, while “لله” is not though it is the same mark
  glyphs except they are being added by multiple substation.
 
  [uni0647.fina_Lellah=4+721|uni0670=1@-267,-162|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319]
 
  vs.:
 
  [uni0647.fina_Lellah=2+721|uni0670=1@-245,-440|uni0651=1@-277,-440|uni0644.medi_Lellah=1+473|uni0644.init_Lellah=0+319]
 
  Regards,
   Khaled
 
  On Fri, Jun 08, 2012 at 10:14:19PM -0400, Behdad Esfahbod wrote:
  Hi Khaled and others,
 
  I fixed this, among other things, including a major mlig and mkmk 
  regression.
   Please test.
 
  behdad
 
  On 05/12/2012 08:54 AM, Khaled Hosny wrote:
  Hi all,
 
  There seems to be a difference between HarfBuzz and Uniscribe on how to
  handle mark positioning when there is multiple glyph substitution,
  namely HB seems to apply the mark to the last component while USP
  applies it to the first component.
 
  In other words, if there is base → base₁base₂ substitution, the
  sequence basemark will be rendered as if it was base₁base₂mark
  with HB, but as base₁markbase₂ with USP.
 
  Using hb-shape with “uniscribe” shaper, and the word “سَتا” and Arabic
  Typesetting font, I get
 

  [uniFE8E=3+343|uniFE98=2+376|uni064E=0@501,-260|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840]
   
  but with “ot” shaper, I get:
 

  [uniFE8E=3+343|uniFE98=2+376|uni064E=0@-11,-310|uni0640.curvehalf=0@,34+152|uniFEB3=0@,34+840]
   
  though the glyph string is the same, the position of the mark is clearly
  different.
 
  (background: I need this to contextually insert tatweel to avoid mark
  collision in “crowded” places, but with the difference between both
  engines this can’t be reliably done without breaking mark positioning in
  one of them).
 
  Regards,
   Khaled
  ___
  HarfBuzz mailing list
  HarfBuzz@lists.freedesktop.org
  http://lists.freedesktop.org/mailman/listinfo/harfbuzz
 
  
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Mapping output glyphs back to input character

2012-07-25 Thread Khaled Hosny
On Sun, Jul 22, 2012 at 11:37:23PM -0400, Behdad Esfahbod wrote:
 Hi Khaled,
 
 On 07/21/2012 05:49 AM, Khaled Hosny wrote:
  How do I map output glyphs back to input characters? I assume I've to
  use clusters for that, but I can't make much sense of the cluster
  numbers I'm seeing and don't seem to find any explanation for them.
 
 When you add text to a hb_buffer_t, you set a cluster number for each
 character.  The functions hb_buffer_add_utf* implicitly use the index into the
 input string for the cluster.  Ie. when using the UTF-8 version, UTF-8 indices
 are used.
 
 Note that hb-view/hb-shape by default use UTF-32 cluster numbers (ie.
 character-count instead of byte-count).  You can change that using
 --utf8-clusters.

I’m using UTF-16 (playing with porting LibreOffice to HarfBuzz), so how
surrogate pairs are handled?

 The shaping process implicitly segments the input text + output glyphs in a
 series of clusters.  So you can think of, for LTR text, first cluster followed
 by second cluster, followed by third cluster, etc, where each cluster contains
 a number of characters and a number of glyphs.
 
 Now, the hb_glyph_info_t::cluster member after shaping simply points to the
 minimum value of that member for all the characters that belong to the 
 cluster.
 
 For RTL it's similar, though in reverse direction.
 
 Quick example.  If you add text for differ, then initially characters get
 cluster values 0,1,2,3,4,5 respectively.  After shaping, if the 'ff' ligature
 was formed, you will get five glyphs, with cluster values 0,1,2,4,5.  This
 means that the two characters that originally had cluster values 2 and 3 are
 represented by the sole glyph having the cluster value 2.
 
 Hope that helps.

Thanks Behdad, this was very helpful.

Regards,
 Khaled
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Harfbuzz on Windows

2012-07-25 Thread Samiullah Khawaja
Hi,

What about uniscribe? Will it be able to replace the icu/glib dependency. I
thought I can get rid of icu/glib dependency by using uniscribe. Is my
assumption wrong? What does hb-uniscribe.cc file in harfbuzz-ng source do?

Are there any Windows APIs I can use and write a layer on it to replace the
icu/glib dependency?

Thanks for the help.
Sami

On Thu, Jul 26, 2012 at 2:11 AM, Behdad Esfahbod beh...@behdad.org wrote:

 On 07/25/2012 04:58 PM, Grigori Goronzy wrote:
  Unfortunately both ICU and glib are huge frameworks with their own set
  of dependencies. A stripped-down standalone library that provides a
  unicode character database and nothing more would be great to have.

 I'm planning on shipping a barebone set of Unicode data internally in
 HarfBuzz
 for such usecases.  No concrete timeline right now, but it's an afternoon's
 worth of hacking.

 behdad
 ___
 HarfBuzz mailing list
 HarfBuzz@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/harfbuzz




-- 
Samiullah Khawaja
Software Engineer
email: sami.khaw...@gmail.com
voice: +(92) 0321-4184324
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


[HarfBuzz] harfbuzz-ng: Branch 'master'

2012-07-25 Thread Behdad Esfahbod
 src/hb-old.cc|   51 +--
 src/hb-old/harfbuzz-shaper.h |1 
 src/hb-uniscribe.cc  |5 ++--
 3 files changed, 44 insertions(+), 13 deletions(-)

New commits:
commit 91e721ea8693205f4f738bca97a5055ee75cf463
Author: Behdad Esfahbod beh...@behdad.org
Date:   Wed Jul 25 19:20:34 2012 -0400

[hb-old] Fix clusters

Unlike its documentation, hb-old's log_clusters are, well, indeed
logical, not visual.  Fixup.  Adapted / copied from hb-uniscribe.

diff --git a/src/hb-old.cc b/src/hb-old.cc
index a3b5679..e828ca8 100644
--- a/src/hb-old.cc
+++ b/src/hb-old.cc
@@ -251,6 +251,8 @@ _hb_old_shape (hb_font_t  *font,
 
   buffer-guess_properties ();
 
+  bool backward = HB_DIRECTION_IS_BACKWARD (buffer-props.direction);
+
 #define FAIL(...) \
   HB_STMT_START { \
 DEBUG_MSG (OLD, NULL, __VA_ARGS__); \
@@ -285,23 +287,23 @@ retry:
   pchars[chars_len++] = 0xDC00 + ((c - 0x1)  ((1  10) - 1));
 }
   }
-#undef utf16_index
 
 
 #define ALLOCATE_ARRAY(Type, name, len) \
   name = (Type *) scratch; \
-  scratch += len * sizeof (name[0]); \
-  scratch_size -= len * sizeof (name[0]);
+  scratch += (len) * sizeof ((name)[0]); \
+  scratch_size -= (len) * sizeof ((name)[0]);
 
 
   HB_ShaperItem item = {0};
 
   ALLOCATE_ARRAY (const HB_UChar16, item.string, chars_len);
+  ALLOCATE_ARRAY (unsigned short, item.log_clusters, chars_len + 2);
   item.stringLength = chars_len;
   item.item.pos = 0;
   item.item.length = item.stringLength;
   item.item.script = hb_old_script_from_script (buffer-props.script);
-  item.item.bidiLevel = HB_DIRECTION_IS_FORWARD (buffer-props.direction) ? 0 
: 1;
+  item.item.bidiLevel = backward ? 1 : 0;
 
   item.font = old_font;
   item.face = old_face;
@@ -314,14 +316,17 @@ retry:
sizeof (HB_GlyphAttributes) +
sizeof (HB_Fixed) +
sizeof (HB_FixedPoint) +
-   sizeof (unsigned short));
+   sizeof (uint32_t));
 
   item.num_glyphs = num_glyphs;
   ALLOCATE_ARRAY (HB_Glyph, item.glyphs, num_glyphs);
   ALLOCATE_ARRAY (HB_GlyphAttributes, item.attributes, num_glyphs);
   ALLOCATE_ARRAY (HB_Fixed, item.advances, num_glyphs);
   ALLOCATE_ARRAY (HB_FixedPoint, item.offsets, num_glyphs);
-  ALLOCATE_ARRAY (unsigned short, item.log_clusters, num_glyphs);
+  uint32_t *vis_clusters;
+  ALLOCATE_ARRAY (uint32_t, vis_clusters, num_glyphs);
+
+#undef ALLOCATE_ARRAY
 
   if (!HB_ShapeItem (item))
 return false;
@@ -335,24 +340,48 @@ retry:
   }
   num_glyphs = item.num_glyphs;
 
-#undef ALLOCATE_ARRAY
+  /* Ok, we've got everything we need, now compose output buffer,
+   * very, *very*, carefully! */
+
+  /* Calculate visual-clusters.  That's what we ship. */
+  for (unsigned int i = 0; i  num_glyphs; i++)
+vis_clusters[i] = -1;
+  for (unsigned int i = 0; i  buffer-len; i++) {
+uint32_t *p = 
vis_clusters[item.log_clusters[buffer-info[i].utf16_index()]];
+*p = MIN (*p, buffer-info[i].cluster);
+  }
+  if (!backward) {
+for (unsigned int i = 1; i  num_glyphs; i++)
+  if (vis_clusters[i] == -1)
+   vis_clusters[i] = vis_clusters[i - 1];
+  } else {
+for (int i = num_glyphs - 2; i = 0; i--)
+  if (vis_clusters[i] == -1)
+   vis_clusters[i] = vis_clusters[i + 1];
+  }
+
+#undef utf16_index
 
+  buffer-ensure (num_glyphs);
+  if (buffer-in_error)
+FAIL (Buffer in error);
+
+
+  buffer-len = num_glyphs;
   hb_glyph_info_t *info = buffer-info;
   for (unsigned int i = 0; i  num_glyphs; i++)
   {
 info[i].codepoint = item.glyphs[i];
-info[i].cluster = item.log_clusters[i];
+info[i].cluster = vis_clusters[i];
 
 info[i].mask = item.advances[i];
 info[i].var1.u32 = item.offsets[i].x;
 info[i].var2.u32 = item.offsets[i].y;
   }
-  buffer-len = num_glyphs;
 
   buffer-clear_positions ();
 
-  unsigned int count = buffer-len;
-  for (unsigned int i = 0; i  count; ++i) {
+  for (unsigned int i = 0; i  num_glyphs; ++i) {
 hb_glyph_info_t *info = buffer-info[i];
 hb_glyph_position_t *pos = buffer-pos[i];
 
diff --git a/src/hb-old/harfbuzz-shaper.h b/src/hb-old/harfbuzz-shaper.h
index 3f32d47..ab65004 100644
--- a/src/hb-old/harfbuzz-shaper.h
+++ b/src/hb-old/harfbuzz-shaper.h
@@ -251,6 +251,7 @@ struct HB_ShaperItem_ {
 HB_Fixed *advances; /* output: num_glyphs advances */
 HB_FixedPoint *offsets; /* output: num_glyphs offsets */
 unsigned short *log_clusters;   /* output: for each output glyph, 
the index in the input of the start of its logical cluster */
+/* XXX the discription for log_clusters is wrong.  It maps each input 
position to output glyph position! */
 
 /* internal */
 HB_Bool kerning_applied;/* output: true if kerning was 
applied by the shaper */

[HarfBuzz] harfbuzz-ng: Branch 'master' - 2 commits

2012-07-25 Thread Behdad Esfahbod
 src/hb-old.cc  |   15 ---
 src/hb-old/harfbuzz-shaper.cpp |2 +-
 2 files changed, 9 insertions(+), 8 deletions(-)

New commits:
commit 2e7f223054d310695bdb3498b2b2b5d17b6cce78
Author: Behdad Esfahbod beh...@behdad.org
Date:   Wed Jul 25 19:30:15 2012 -0400

[hb-old] Fix Arabic cursive positioning

Backporting from upstream:

commit b847f24ce855d24f6822bcd9c0006905e81b94d8
Author: Behdad Esfahbod beh...@behdad.org
Date:   Wed Jul 25 19:29:16 2012 -0400

[arabic] Fix Arabic cursive positioning

This was clearly broken in testing.  Who knows...  Fixes for me.
Test with a Nastaleeq font, or with Arabic Typesetting.

Backporting from Chromium.

diff --git a/src/hb-old/harfbuzz-shaper.cpp b/src/hb-old/harfbuzz-shaper.cpp
index 5baf971..62886f3 100644
--- a/src/hb-old/harfbuzz-shaper.cpp
+++ b/src/hb-old/harfbuzz-shaper.cpp
@@ -923,7 +923,7 @@ HB_Bool HB_OpenTypePosition(HB_ShaperItem *item, int 
availableGlyphs, HB_Bool do
 adjustment = HB_FIXED_ROUND(adjustment);
 
 if (positions[i].new_advance) {
-advances[i] = adjustment;
+; //advances[i] = adjustment;
 } else {
 advances[i] += adjustment;
 }
commit 9550a8c4e8b4e28be60d38c27d59253846ff9569
Author: Behdad Esfahbod beh...@behdad.org
Date:   Wed Jul 25 19:22:57 2012 -0400

[hb-old] Fixup not-enough-space handling

diff --git a/src/hb-old.cc b/src/hb-old.cc
index e828ca8..be0187f 100644
--- a/src/hb-old.cc
+++ b/src/hb-old.cc
@@ -329,14 +329,15 @@ retry:
 #undef ALLOCATE_ARRAY
 
   if (!HB_ShapeItem (item))
-return false;
-
-  if (unlikely (item.num_glyphs  num_glyphs))
   {
-buffer-ensure (buffer-allocated * 2);
-if (buffer-in_error)
-  FAIL (Buffer resize failed);
-goto retry;
+if (unlikely (item.num_glyphs  num_glyphs))
+{
+  buffer-ensure (buffer-allocated * 2);
+  if (buffer-in_error)
+   FAIL (Buffer resize failed);
+  goto retry;
+}
+return false;
   }
   num_glyphs = item.num_glyphs;
 
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Mapping output glyphs back to input character

2012-07-25 Thread Behdad Esfahbod
On 07/25/2012 07:17 PM, Khaled Hosny wrote:
 On Sun, Jul 22, 2012 at 11:37:23PM -0400, Behdad Esfahbod wrote:
 Hi Khaled,

 On 07/21/2012 05:49 AM, Khaled Hosny wrote:
 How do I map output glyphs back to input characters? I assume I've to
 use clusters for that, but I can't make much sense of the cluster
 numbers I'm seeing and don't seem to find any explanation for them.

 When you add text to a hb_buffer_t, you set a cluster number for each
 character.  The functions hb_buffer_add_utf* implicitly use the index into 
 the
 input string for the cluster.  Ie. when using the UTF-8 version, UTF-8 
 indices
 are used.

 Note that hb-view/hb-shape by default use UTF-32 cluster numbers (ie.
 character-count instead of byte-count).  You can change that using
 --utf8-clusters.
 
 I’m using UTF-16 (playing with porting LibreOffice to HarfBuzz), so how
 surrogate pairs are handled?

See bottom of hb-buffer.cc.  cluster values after shaping hook back to
UTF-16 index in the original.

If you want to be more impactful, don't port LibreOffice, port iculayout!
It's probably 400 lines of code...

behdad

 The shaping process implicitly segments the input text + output glyphs in a
 series of clusters.  So you can think of, for LTR text, first cluster 
 followed
 by second cluster, followed by third cluster, etc, where each cluster 
 contains
 a number of characters and a number of glyphs.

 Now, the hb_glyph_info_t::cluster member after shaping simply points to the
 minimum value of that member for all the characters that belong to the 
 cluster.

 For RTL it's similar, though in reverse direction.

 Quick example.  If you add text for differ, then initially characters get
 cluster values 0,1,2,3,4,5 respectively.  After shaping, if the 'ff' ligature
 was formed, you will get five glyphs, with cluster values 0,1,2,4,5.  This
 means that the two characters that originally had cluster values 2 and 3 are
 represented by the sole glyph having the cluster value 2.

 Hope that helps.
 
 Thanks Behdad, this was very helpful.
 
 Regards,
  Khaled
 
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz


Re: [HarfBuzz] Harfbuzz on Windows

2012-07-25 Thread Behdad Esfahbod
On 07/25/2012 07:18 PM, Samiullah Khawaja wrote:
 Hi,
 
 What about uniscribe? Will it be able to replace the icu/glib dependency. I
 thought I can get rid of icu/glib dependency by using uniscribe. Is my
 assumption wrong? What does hb-uniscribe.cc file in harfbuzz-ng source do?

hb-uniscribe.cc is a backend delegating the whole shaping to the Windows DLL.
 With that, you don't need any Unicode callbacks whatsoever.

It's not meant for production use, but it's actually not that bad.  Depends on
what your goals really are.

 Are there any Windows APIs I can use and write a layer on it to replace the
 icu/glib dependency?

Not that I know of.  I'll produce something tonight.

behdad

 Thanks for the help.
 Sami
 
 On Thu, Jul 26, 2012 at 2:11 AM, Behdad Esfahbod beh...@behdad.org
 mailto:beh...@behdad.org wrote:
 
 On 07/25/2012 04:58 PM, Grigori Goronzy wrote:
  Unfortunately both ICU and glib are huge frameworks with their own set
  of dependencies. A stripped-down standalone library that provides a
  unicode character database and nothing more would be great to have.
 
 I'm planning on shipping a barebone set of Unicode data internally in 
 HarfBuzz
 for such usecases.  No concrete timeline right now, but it's an 
 afternoon's
 worth of hacking.
 
 behdad
 ___
 HarfBuzz mailing list
 HarfBuzz@lists.freedesktop.org mailto:HarfBuzz@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/harfbuzz
 
 
 
 
 -- 
 Samiullah Khawaja
 Software Engineer
 email: sami.khaw...@gmail.com mailto:sami.khaw...@gmail.com
 voice: +(92) 0321-4184324
___
HarfBuzz mailing list
HarfBuzz@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/harfbuzz