On 2/8/15 17:45, Simon Cozens wrote:
Here's an interesting one I came across when implementing Uyghur
hyphenation. The trick in hyphenated Uyghur is to use a ZWJ to ensure
that the last character of hyphenated Arabic morphemes remains in medial
form. However, when I send a Arabic + ZWJ + hyphen sequence to Harfbuzz,
it inserts a space between the hyphen and the Arabic:

zwj = SU.utf8char(0x200d)
text = "تئەۋ" .. zwj .. "-"
SILE.shaper:shapeToken(text, SILE.font.loadDefaults({ font = "Amiri",
direction = "RTL"  }))
{
   {
     codepoint = 16,
     depth = -1.943359375,
     height = 2.666015625,
     name = "hyphen",
     width = 3.681640625,
   },
   {
     codepoint = 3,
     depth = 0,
     height = 0,
     name = "space",
     width = 2.9296875,
   },
   {
     codepoint = 552,
     depth = 2.24609375,
     height = 6.2841796875,
     name = "uni06CB",
     width = 4.0087890625,
   },
  {
     codepoint = 2226,
     depth = 0.048828125,
     height = 4.580078125,
     name = "uni06D5.fina",
     width = 3.7939453125,
   },
   {
     codepoint = 3024,
     depth = 0.0048828125,
     height = 5.078125,
     name = "uni0626.medi_BaaBaaInit",
     width = 1.6845703125,
   },
   {
     codepoint = 3732,
     depth = 0.0634765625,
     height = 4.8779296875,
     name = "uni062A.init_BaaBaaIsol",
     width = 3.193359375,
   },
}

Making the case even more simple:

SILE.shaper:shapeToken(zwj, SILE.font.loadDefaults({ font = "Amiri",
direction = "RTL"  }))
{
   {
     codepoint = 3,
     depth = 0,
     height = 0,
     name = "space",
     width = 2.9296875,
   },
}

I would have hoped that a zero-width joiner had... zero width.

It's expected that you'll see a <space> glyph here, because harfbuzz uses that as a replacement for default-ignorables; however, it also sets the advance width to zero, so I'm not sure why you're seeing a non-zero advance.

Testing locally with hb-shape, I get a zero-width <space> (as expected):

$ hb-unicode-encode 062A 0626 06D5 06CB 200d 2d | hb-shape amiri-regular.ttf
[hyphen=5+754|space=4+0|uni06CB=3+821|uni06D5.fina=2+777|uni0626.medi_BaaBaaInit=1+345|uni062A.init_BaaBaaIsol=0+654]

$ hb-unicode-encode 200b | hb-shape amiri-regular.ttf
[space=0+0]

Which suggests there's something odd about how you're using harfbuzz.

JK

_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to