On Fri, 15 Jun 2018 17:53:41 -0500 Nathan Willis <[email protected]> wrote:
> On Wed, Jun 6, 2018 at 2:29 PM, Richard Wordingham < > [email protected]> wrote: > > > On Tue, 5 Jun 2018 09:42:38 -0500 > > Nathan Willis <[email protected]> wrote: > > > > > Your feedback and help is appreciated! > > > > * Malayalam Remarks * > > > > In Sections 2.2 and 2.3, how are multiple vowels handled, such as > > U+0D4A and U+0D4B? I'm particularly interested in the handling of > > multiple left matras. > > > > Hmm. So, as I understand it, in HarfBuzz the presence of multiple > matras (on any side) would be an issue dealt with by the > syllable-identification regular expressions, before getting to the > reordering stuff. > > It seems like this it what is used (the same regexps being used for > all scripts in HarfBuzz's Indic shaper): > > matra_group = z{0,3}.M.N?.(H | forced_rakar)?; > [...] > halant_or_matra_group = (final_halant_group | (H.ZWJ)? > matra_group{0,4}); > > ... and that only permits four matras (total) per syllable. > > I vaguely recall seeing a commit message or comment or something > indicating that this limit was there to maintain compatibility with > how Uniscribe matches syllables, but I searched around and couldn't > find it today. It was something along the lines of the Microsoft docs > saying "one matra for each type [L,R,T,B] is permitted," but that > isn't clear whether it's justified by orthography at all or is just a > practical concession that they made for some reason. It looks more like a desire to prohibit as many unusual combinations as they can. > Others with more Uniscribe knowledge may know. > > That having been said, I *think* that HarfBuzz doesn't rearrange two > adjacent codepoints that have the same sort-ordering tags. So > "Consonant,U+0D4A,U+0D4B" ought to get the matras decomposed, then > the two left-side parts move together as-is to the left of the > consonant, and the two right-side parts remain unchanged. > > You could test that with > hb-view /usr/share/fonts/truetype/noto/NotoSerifMalayalam-Regular.ttf > --unicodes=0d15,0d4a,0d4b A more revealing test case is hb-view /usr/share/fonts/truetype/ttf-indic-fonts-core/Meera_04.ttf --unicodes=0d15,0d4b,0d4c Remembering that U+0D4B decomposes to <U+0D47 SIGN EE, U+0D3E SIGN AA> and U+0D4C decomposes to <U+0D46 SIGN E, U+0D57 AU LENGTH MARK>, it yields the bizarre sequence <g0D47, g0D46, gKA, g0D3E, g0D57>, in complete violation of the inside-out rule for combining marks in white man's scripts. This behaviour should be documented in some fashion. The behaviour of USE is worthy of comparison. The sequence <U+1A20 TAI THAM LETTER HIGH KA, U+1A70 TAI THAM VOWEL SIGN OO, U+1A6E TAI THAM VOWEL SIGN E, U+1A63 TAI THAM VOWEL SIGN AA>, which is at best a lexicographer's convention, is rendered <gOO, gE, gKA, gAA> in MS Edge but as <gE, gOO, gKA, gAA> by HarfBuzz, which in this case observes the inside-out rule. > In Section 3, how does tagging interact with substitutions? > > Features can in general split and merge glyphs. > > > > > The tagging described in stage 2 is just the reordering / > syllable-position tags. So after all that is done, the > sort-the-syllable-into-final-sort-order is (AIUI) the last that the > tags come into play. > I do know that HarfBuzz keeps track of other sorts of state that it > may refer to internally as tags, but I don't think any of these docs > reference those, just the reordering position tags. > > So applying the features in stage 3 doesn't interact with the tags — > at least, not directly. If the tagging was wrong, of course, then the > final sorted order might be wrong and sequences wouldn't match up to > the substitution rules in GSUB. But, if I follow HarfBuzz's logic > right, the reordering stuff cannot be switched off, so it always > happens completely before any substitutions start, and that seems to > be what other shapers did first. > > Should there be a wording change to address that in the document > itself? In the Indian Indic scripts, there are reportedly three steps: Initial reordering 'Mandatory' substitutions Final reordering Are you saying that the 'final reordering' is a null-op? This cannot be the case. The Rendering of <U+0926 DA, U+094D VIRAMA, U+0926 DA, U+093F SIGN I> depends on whether there is a conjunct D.DA in the font. Assuming DA has no formal half-form, there are two possible normal renderings: <gDA, gVIRAMA, gI, gDA> and <gI, gD.DA> What gets moved when? You say the initial reordering 'may mean moving dependent-vowel (matra) glyphs', and then say, 'The final reordering stage repositions marks, dependent-vowel (matra) signs, and "Reph" glyphs to the appropriate location with respect to the base consonant'. In the USE, there is no initial reordering. In my code-revealing font, Dalekh Si, which is designed for use with a spell-checker (and works well in Firefox), I split preposed vowels into a part that moves and an ink-free part that stays put. I use the ink-free part to colour consonants that follow vowels within the akshara. Now this works in MS Edge and Firefox, but I don't know whether I'm just lucky. Richard. _______________________________________________ HarfBuzz mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/harfbuzz
