OK, after an enjoyable detour into the source code I found it is not a "bug", but rather a "feature" of Harbuzz. In fact the code responsible for that behaviour is in "hb-ot-layout-gpos-table.hh", line 1034 (function OT::MarkBasePosFormat1::apply):
/* We only want to attach to the first of a MultipleSubst sequence.
Reject others. */
if (0 == get_lig_comp (c->buffer->info[skippy_iter.idx])) break;
and the revision where the code was inserted is marked as:
This is apparently what Uniscribe does. Test case is:
SEEN FATHA TEH ALEF
with Arabic Typesetting. Originally reported by Khaled Hosny.
This explains why I noticed this even in Uniscribe... This code, basically,
rejects applying a mark to a base that is not the first element that resulted
from a multiple substitution that happened before, and instead tries to apply
the mark to the first element of said substitution.
However, in my case, this is not the desired behavior. I am writing Biblical
and liturgical Hebrew. One of the most complete fonts to implement a proper
treatment of the complex Hebrew diacritics is SBL Hebrew. Its layout
intelligence was made open source, so there are many fonts (see the culmus
project) that implement it. However, in some particular cases (*), because of
the way the font was designed, in Harfbuzz the positioning behavior results in
a misalignment of the Hebrew letters and their diacritics.
Moreover, in Uniscribe the rendering is actually different than in the current
version of Harfbuzz (and, in my eyes, strange...). In Adobe InDesign,
Fontforge, Mellel and ConTeXt Mark IV the rendering of SBL Hebrew is correct
IMHO. Just to provide a visual feedback I am attaching screenshots of the
rendering of the string קיָ֦גג in Uniscribe (Wordpad), Mellel (the most
correct), the original version of Harfbuzz and a version of Harbuzz where I
edited OT::MarkBasePosFormat1::apply.
I understand that the way Harfbuzz works is because of compatibility with
Uniscribe, however I would just like to know if this positioning behavior is a
decision for the development of Harfbuzz, or something that is open to be
changed in the future.
(*) Namely, in the string "קיָ֦"= qof+vav+qamats+merkha kefula" SBL Hebrew
substitutes "hairspace+vav" for the vav after the qof, and then applies the
"qamats+merkha" diacritics to the yod. Harfbuzz tries to apply it to the
hairspace instead of the yod, and fails.
---
Tom
<<inline: wordpad.png>>
<<inline: mellel.png>>
<<inline: harfbuzz-original.png>>
<<inline: harfbuzz-modified.png>>
Il giorno 07/ott/2013, alle ore 20.42, Rolf Langenhuijzen ha scritto: > Hi Tom, > > If I try your rtf with a simple test then it looks OK to me (see png). > hb-view --output-file=m.png --font-size=100 minimal.ttf aoèièi > > this is hb 0.9.21 with ot shaper > > Rolf > <m.png> > > On Oct 7, 2013, at 1:13 AM, [email protected] wrote: > >> I am just beginning to try Harfbuzz, but I am writing to you because I think >> that I might have found incorrect behavior when I have both a contextual >> chained substitution and a contextual chained positioning. >> >> The problems occur when I have the following two rules: >> 1 Substitute ["e"] with ["o" "e"] when preceded by an "a" (context: { ["a"] >> | } ) >> 2 Position the mark ["gravecomb"] anchoring it to the ["e"] when the mark is >> followed by an "i" (context: { | "i" } ) >> >> What I think I should see when I type ["a" "e" "gravecomb" "i" "e" >> "gravecomb" "i" ] should be something like [aoèièi] >> What I see is more like [aoeˋièi] (the first "gravecomb" is not anchored to >> the "e") >> I used the characters "a", "e", "i", "o", "gravecomb" (U+0300) but the >> problem is not specific to those characters and persists even in right to >> left scripts. I found while examining the font SBLHebrew and the string >> "קוָ֣". >> >> I built a very minimal font that reproduces this problem with the latin >> characters I used for the example. I put online the Fontforge source >> <https://www.dropbox.com/s/a78cypqv3jgmaex/prova.sfd> and the ttf >> <https://www.dropbox.com/s/5hq1c5mdg4isvzo/minimal.ttf> >> >> However, the fact that the problem is reproduced almost exactly on >> Uniscribe, and even in the Proofing tool of MS VOLT makes me wonder if it is >> a bug or not. The problem is not present on the shaping system of ConTeXt >> Mark IV and on Apple's TextEdit, so it is even more mysterious for me. >> >> I also put the link of the (IMHO correct) rendering of Fontforge >> <http://s23.postimg.org/8w44n9b3v/Screenshot_from_2013_10_07_00_55_45.png> >> and of the rendering of hb-view >> <http://s14.postimg.org/p02lzc29t/Screenshot_from_2013_10_07_00_58_32.png> >> (in order to render it with "hb-view --language=dflt --features="calt,kern" >> '/home/mint/Desktop/minimal.ttf' aèièi", be aware that the è is composed of >> two characters, U+0065 and U+0300, because the software tends to convert >> this sequence to the single U+00E8 character). The problem is not with the >> spacing (in my font the "gravecomb" has nonzero width, but it's a mark, so >> its width is somewhat undefined) but with the fact that the first accent is >> not attached to the first "e". >> >> -- >> Tom >> _______________________________________________ >> HarfBuzz mailing list >> [email protected] >> http://lists.freedesktop.org/mailman/listinfo/harfbuzz >
_______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
