On 15/06/2018 15:53, Nathan Willis wrote:

It seems like this it what is used (the same regexps being used for all scripts in HarfBuzz's Indic shaper):

matra_group = z{0,3}.M.N?.(H | forced_rakar)?;
[...]
halant_or_matra_group = (final_halant_group | (H.ZWJ)? matra_group{0,4});

... and that only permits four matras (total) per syllable.

I vaguely recall seeing a commit message or comment or something indicating that this limit was there to maintain compatibility with how Uniscribe matches syllables, but I searched around and couldn't find it today. It was something along the lines of the Microsoft docs saying "one matra for each type [L,R,T,B] is permitted," but that isn't clear whether it's justified by orthography at all or is just a practical concession that they made for some reason.

Others with more Uniscribe knowledge may know.

Indeed, the spec at https://docs.microsoft.com/en-us/typography/script-development/devanagari#analyze-the-text says "matra (up to one of each type: pre-, above-, below- or post- base)"

However, I'm not sure it's a good idea to enforce this restriction. While "normal" spelling may abide by it, in casual writing people sometimes like to use repeated matras, just as an English speaker might write "Helloooooooo!"

E.g. see https://www.xossip.com/showthread.php?t=1498145, where the writer uses a number of "stretched-out" spellings (search in the page for आाााााााााााााह, for example).

JK
_______________________________________________
HarfBuzz mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to