On Thu, 26 Feb 2015 09:09:31 -0800 Behdad Esfahbod <[email protected]> wrote:
> Hi Richard, > > I was away for a few weeks. I'm glad you and Roozbeh got into > discussion. Working with him and Andrew is indeed the best way > forward. Note that as you observed, SEA is very liberal in what it > accepts. That's simply because we didn't know any better. Actually, Martin Hosken once presented a very similar production to the UTC, but without the algebraic simplification. No-one remarked that there wasn't much that it disallowed. The very first word in the MFL dictionary is <HIGH KA, SIGN U, TONE-2, SIGN AA, SAKOT, NA, SAKOT, NGA> /kaankuĊ/ 'to prosper', with two(!) final consonants attached below the visually final vowel. It renders fine on LibreOffice at the moment - thanks to HarfBuzz. I wrestled and failed with the problem of encoding this word phonetically. > What would immensely help is to gather sequences that you (and > others) think should be considered one syllable. We can then add > these to Roozbeh's indic repository as test data (with the USE > grammar). That will be extremely valuable, and I'm willing to set up > the code to run the tests. I take it you're looking for a regular expression. Would this be a regular expression for strings of symbols, rather than traces? (Traces are defined from strings by allowing certain pairs of 'letters' to commute - fully decomposed character strings under canonical equivalence are the example that interests us. The theory gets messy with Kleene star.) I notice USE seems, from the Buginese and some (all?) of the Tibetan overrides, to be working by matching NFD strings against the patterns. May I assume a suitable permutation of the non-zero canonical combining classes? Alternatively, are you just looking for a probing test set of real words? I can tackle the Tai Tham script. Other scripts are likely to get a sketchy treatment from me, probably based on what I can glean from the encoding proposals. Richard. _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
