On 15-02-26 11:04 AM, Richard Wordingham wrote: >> What would immensely help is to gather sequences that you (and >> > others) think should be considered one syllable. We can then add >> > these to Roozbeh's indic repository as test data (with the USE >> > grammar). That will be extremely valuable, and I'm willing to set up >> > the code to run the tests. > I take it you're looking for a regular expression. Would this be a > regular expression for strings of symbols, rather than traces? (Traces > are defined from strings by allowing certain pairs of 'letters' to > commute > - fully decomposed character strings under canonical equivalence are the > example that interests us. The theory gets messy with Kleene star.) I > notice USE seems, from the Buginese and some (all?) of the Tibetan > overrides, to be working by matching NFD strings against the patterns. > May I assume a suitable permutation of the non-zero canonical combining > classes? > > Alternatively, are you just looking for a probing test set of real > words?
Real or fictional words. Whatever you think should be considered a syllable for these purposes. -- behdad http://behdad.org/ _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
