On 25/1/14 17:36, [email protected] wrote:

     * Conditional on some assessment of the structure of the syllable
       (perhaps the existence of a precomposed glyph?) the *jmo features may
       be applied - presumably to the output of ccmp, if it was applied.

Yes - remembering that the decision as to which *jmo feature, if any, applies
to a given glyph was made *before* ccmp, and knows nothing about any changes
that happened there.

What happens to these decisions when ccmp make substitutions?  If we have
a single glyph L tagged for ljmo and ccmp replaces it with a single glyph,
is the new glyph also tagged for ljmo?

Yes.

If we have something like L tagged
for ljmo followed by LV not tagged, and ccmp replaces the pair of them
with a single LLV glyph, will the LLV glyph be tagged?

Yes (at least, I think that's right - it'd be worth double-checking). However, note that if you have, say, LV (not tagged for any *jmo feature) followed by T (tagged tjmo) and replace the pair with LVT, I don't think the resulting LVT will inherit the tjmo. When GSUB does a many-to-one substitution, the result inherits the feature flags of the first glyph in the input sequence, and the feature flags of the subsequent glyph(s) are lost.

If we have
something like a single LLL glyph tagged for ljmo (the shaper would do
that, right?) and ccmp splits it into three glyphs L L L, which if any of
the new glyphs will inherit the tagging status of the original?

Yes. One-to-many will duplicate the features of the one to its many replacements.

The two problems you're facing, I think, with the current harfbuzz code in relation to the use of *jmo in your font are that:

(a) precomposed characters (LV, LVT) do not get tagged for any *jmo features, and if you decompose them with ccmp, the resulting glyphs still aren't tagged for *jmo (unlike the case where the shaper decomposes them); and

(b) sequences with multiple L, V and/or T jamos are not recognized as matching the <L, V [,T]?> pattern, and so do not get tagged for *jmo. In something like <L, L, L, V, V, V, T, T, T>, the only two glyphs that would be tagged for *jmo features would be the adjacent <L, V> pair; all the rest would be considered "not part of a valid syllable" and left untagged.

But if you ignore the *jmo features altogether, and do everything in a series of ccmp lookups, I don't see why it shouldn't work as you intend.

If I go this route, defining no *jmo tables, can I depend on ccmp and liga
always being applied and always in that order?

Currently, at least in harfbuzz, ccmp and liga (and the *jmo features, when
used) are all applied "together", with the order of lookups being their order

What does applying them "together" mean?  Is it just that nothing other
than feature application is done in between applying features, or are
they somehow simultaneous?  In other words, does the output of each one
become the input of the next, or are they all looking at the same input
with the output somehow recombined?

What actually happens is more like the description in http://www.microsoft.com/typography/otspec/chapter2.htm:

"After choosing which features to use, the client assembles all lookups from the selected features. Multiple lookups may be needed to define the data required for different substitution and positioning actions, as well as to control the sequencing and effects of those actions. To implement features, a client applies the lookups in the order the lookup definitions occur in the LookupList. As a result, within the GSUB or GPOS table, lookups from several different features may be interleaved during text processing."

So for the L glyph in an <L, V, T> sequence, for example, the selected features will include ljmo, as well as the "global" features ccmp and liga (and others such as rlig, locl, etc.) We collect a list of all the lookups from all these features, and apply those lookups in the order they're defined in the font's LookupList, *not* in any predetermined feature order.

Some shapers - particularly the Indic one - do apply features in separate passes, because (unfortunately) that's how Microsoft chose to implement their Indic fonts and shaper, but we have not found this to be necessary for Hangul, and would prefer to avoid it.


If I have glyphs L V T, with features ljmo and vjmo run in that order
(glyph L tagged for ljmo and glyph V tagged for vjmo), and I want ljmo to
change L into L.alt and vjmo to change V into V.alt, should vjmo contain a
rule like "sub L.alt V' T" or like "sub L V' T"?

As you'll see from the above, this depends on how you order the lookups (rather than on a fixed feature order imposed by the shaper).



I thought that with multiple lookups in a single feature, substitution
would still stop as soon as it found a match - so that the multiple
lookups have the same effect as a single long lookup, with the advantages
over really using a single long lookup being that using more than one
allows sharing parts of tables among separate features, and splitting into
more than one table allows representing runs of simpler rules in more
concise table formats.

But some quick experiments with FontForge suggest that in fact (at least
in FontForge) it's as you imply:  with multiple lookups in a feature, each
one is applied to the output of the previous one.  Thanks for bringing
that to my attention!  It will make things a lot easier for me.

Perhaps you were confusing this with the case of multiple *subtables* within a single *lookup*. In this case, once a match occurs in one of the subtables, the lookup is considered to have finished, and the following subtables are not applied.

But multiple *lookups* within a single *feature* are definitely supported and used.


Something else I hadn't realized, but have just now verified at least in
the case of FontForge, was that the order of tables in the font can
override the "ccmp must be applied first" rule.  I thought that was
advice for renderers, but apparently it's the font's responsibility to
implement it by putting ccmp first in the file.

Yes - again, see above.

I have not tested whether Uniscribe behaves this way for Hangul, or whether it runs the features separately (as seems to be implied by the old documentation). Provided you design your lookups to be applied in the documented ccmp/ljmo/vjmo/tjmo/liga order *and* arrange the lookups this way in the font, it shouldn't matter whether the shapers run them "all at once" according to the generic OpenType spec or in separate passes.


JK

_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to