Re: [HarfBuzz] Hangul GSUB features

Jonathan Kew Sat, 25 Jan 2014 11:18:10 -0800

On 25/1/14 17:36, [email protected] wrote:

     * Conditional on some assessment of the structure of the syllable
       (perhaps the existence of a precomposed glyph?) the *jmo features may
       be applied - presumably to the output of ccmp, if it was applied.


Yes - remembering that the decision as to which *jmo feature, if any, applies
to a given glyph was made *before* ccmp, and knows nothing about any changes
that happened there.


What happens to these decisions when ccmp make substitutions?  If we have
a single glyph L tagged for ljmo and ccmp replaces it with a single glyph,
is the new glyph also tagged for ljmo?


Yes.

If we have something like L tagged
for ljmo followed by LV not tagged, and ccmp replaces the pair of them
with a single LLV glyph, will the LLV glyph be tagged?

Yes (at least, I think that's right - it'd be worth double-checking).However, note that if you have, say, LV (not tagged for any *jmofeature) followed by T (tagged tjmo) and replace the pair with LVT, Idon't think the resulting LVT will inherit the tjmo. When GSUB does amany-to-one substitution, the result inherits the feature flags of thefirst glyph in the input sequence, and the feature flags of thesubsequent glyph(s) are lost.

If we have
something like a single LLL glyph tagged for ljmo (the shaper would do
that, right?) and ccmp splits it into three glyphs L L L, which if any of
the new glyphs will inherit the tagging status of the original?

Yes. One-to-many will duplicate the features of the one to its manyreplacements.

The two problems you're facing, I think, with the current harfbuzz codein relation to the use of *jmo in your font are that:

(a) precomposed characters (LV, LVT) do not get tagged for any *jmofeatures, and if you decompose them with ccmp, the resulting glyphsstill aren't tagged for *jmo (unlike the case where the shaperdecomposes them); and

(b) sequences with multiple L, V and/or T jamos are not recognized asmatching the <L, V [,T]?> pattern, and so do not get tagged for *jmo. Insomething like <L, L, L, V, V, V, T, T, T>, the only two glyphs thatwould be tagged for *jmo features would be the adjacent <L, V> pair; allthe rest would be considered "not part of a valid syllable" and leftuntagged.

But if you ignore the *jmo features altogether, and do everything in aseries of ccmp lookups, I don't see why it shouldn't work as you intend.

If I go this route, defining no *jmo tables, can I depend on ccmp and liga
always being applied and always in that order?


Currently, at least in harfbuzz, ccmp and liga (and the *jmo features, when
used) are all applied "together", with the order of lookups being their order


What does applying them "together" mean?  Is it just that nothing other
than feature application is done in between applying features, or are
they somehow simultaneous?  In other words, does the output of each one
become the input of the next, or are they all looking at the same input
with the output somehow recombined?

What actually happens is more like the description inhttp://www.microsoft.com/typography/otspec/chapter2.htm:

"After choosing which features to use, the client assembles all lookupsfrom the selected features. Multiple lookups may be needed to define thedata required for different substitution and positioning actions, aswell as to control the sequencing and effects of those actions.To implement features, a client applies the lookups in the order thelookup definitions occur in the LookupList. As a result, within the GSUBor GPOS table, lookups from several different features may beinterleaved during text processing."

So for the L glyph in an <L, V, T> sequence, for example, the selectedfeatures will include ljmo, as well as the "global" features ccmp andliga (and others such as rlig, locl, etc.) We collect a list of all thelookups from all these features, and apply those lookups in the orderthey're defined in the font's LookupList, *not* in any predeterminedfeature order.

Some shapers - particularly the Indic one - do apply features inseparate passes, because (unfortunately) that's how Microsoft chose toimplement their Indic fonts and shaper, but we have not found this to benecessary for Hangul, and would prefer to avoid it.


If I have glyphs L V T, with features ljmo and vjmo run in that order
(glyph L tagged for ljmo and glyph V tagged for vjmo), and I want ljmo to
change L into L.alt and vjmo to change V into V.alt, should vjmo contain a
rule like "sub L.alt V' T" or like "sub L V' T"?

As you'll see from the above, this depends on how you order the lookups(rather than on a fixed feature order imposed by the shaper).


I thought that with multiple lookups in a single feature, substitution
would still stop as soon as it found a match - so that the multiple
lookups have the same effect as a single long lookup, with the advantages
over really using a single long lookup being that using more than one
allows sharing parts of tables among separate features, and splitting into
more than one table allows representing runs of simpler rules in more
concise table formats.

But some quick experiments with FontForge suggest that in fact (at least
in FontForge) it's as you imply:  with multiple lookups in a feature, each
one is applied to the output of the previous one.  Thanks for bringing
that to my attention!  It will make things a lot easier for me.

Perhaps you were confusing this with the case of multiple *subtables*within a single *lookup*. In this case, once a match occurs in one ofthe subtables, the lookup is considered to have finished, and thefollowing subtables are not applied.

But multiple *lookups* within a single *feature* are definitelysupported and used.


Something else I hadn't realized, but have just now verified at least in
the case of FontForge, was that the order of tables in the font can
override the "ccmp must be applied first" rule.  I thought that was
advice for renderers, but apparently it's the font's responsibility to
implement it by putting ccmp first in the file.


Yes - again, see above.

I have not tested whether Uniscribe behaves this way for Hangul, orwhether it runs the features separately (as seems to be implied by theold documentation). Provided you design your lookups to be applied inthe documented ccmp/ljmo/vjmo/tjmo/liga order *and* arrange the lookupsthis way in the font, it shouldn't matter whether the shapers run them"all at once" according to the generic OpenType spec or in separate passes.



JK

_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Re: [HarfBuzz] Hangul GSUB features

Reply via email to