Hi, Martin, On Thu, Apr 28, 2011 at 10:47 PM, Martin Hosken <[email protected]> wrote: > Dear Ed, > >> Hi, Behdad, > > Hmm I seem to have been locked out of the harfbuzz list. Behdad could you > look into it at some point. TIA. > >> Now that everything is working fine, I am going to start taking a look >> at writing shaper >> code to handle Tai Tham and related Southeast Asian scripts (New Tai >> Lue, Tai Viet, and Myanmar; eventually probably also adding >> Cambodian). > > I don't think trying to support all of these scripts in a single shaper is a > good idea. > > Tai Viet can be handled, like Thai, using the generic shaper
RE: Tai Viet -- Ooops, my bad! OK, I had not read chapter 4, "Character Properties" of the Unicode Standard Book since something like version 3.0, and just assumed that Tai Viet, having been encoded more recently, would follow the logical model used for Tai Tham, Khmer, etc. rather than the visual model of Thai and Lao. I stand corrected -- and that's one less script to have to worry about! :-) > New Tai Lue is the simplest script to shape given there is no OT lookup > interaction required > > Why clog these two scripts up with having to do all the tai tham tests? I > think writing > a tai tham shaper would be great. Do you intend to add this shaper to all OT > engines? Well, please feel free to tell me so if I am being very naïve here -- but what I have been thinking of doing is writing somewhat generalized "reordering" code that will shift the "reordrant" combining mark characters (as listed in Table 4-4 of the Unicode Standard 6.0) to the front of clusters (before glyph substitution occurs). In theory --albeit possibly my own naïve theory-- this could be done in a fairly generic fashion for a reasonable number, if not all, of the (nineteen) Brahmic-derived scripts presented in the aforementioned Table 4-4. Relevant properties of the individual code points in the string buffer would be looked up in script-specific tables. I am considering the possibility of using bit masks to tag the table code point entries by category. Thus a single code point could be tagged with multiple categories as required: ==> U+1A6E TAI THAM VOWEL SIGN E would be "DEPENDENT_VOWEL | REORDRANT" ==> U+1A55 TAI THAM MEDIAL RA would be "CONSONANT | REORDRANT" Naturally other bit masks will be defined as required. For example, we know that a syllable (cluster) may start with either a CONSONANT or an INDEPENDENT_VOWEL. So perhaps it would be useful to have a bit mask category called CAN_BEGIN_CLUSTER. That way, when looking for the beginning of a cluster, we can just test for CAN_BEGIN_CLUSTER instead of having to test separately for "CONSONANT or INDEPENDENT VOWEL". Basically whatever individual tests have to be performed against individual code points determines what bit-mask categories should be defined. In theory, this will make the code and the associated tables easy for humans to read and understand. As I currently envision things, this "reorderer" works in two passes: 1. The first pass finds REORDRANT DEPENDENT_VOWELs and moves them to the beginning of their respective clusters. In the simplest case, the reordrant dependent vowel get swapped with just a single consonant : this is the only case that exists for NEW TAI LUE. Consonant clusters in other scripts may be written with several letters, so for other scripts we may have to shift the reordrant dependent vowel several places. 2. The 2nd pass finds REORDRANT CONSONANTs (i.e., MEDIAL RA). As far as I am aware at this point in my education, a REORDRANT medial RA only needs to swap places with a single preceding consonant character, so this case is as simple as Johnathan's implementation for New Tai Lue. Note my plan at this point is only (to attempt) to write the "reordering" code. Beyond Harbuzz's generic OT shaper, implementation of other features may still be required in order to execute complete shaping of various other Brahmic-derived scripts. For Tai Tham, I think the generic shaper plus reordering as described here is enough. For other scripts, like Khmer, some additional features will be necessary. And for a lot of other scripts, I'm sure I don't know! > > Just because a shaper involves reordering doesn't mean that we only need one > such > shaper. > I would love to hash this out over a round-table discussion about this with you, Johnathan Kew, Theppitak Karoonboonyanan, Danh Hong, and everyone else who understands the details of how these Brahmic-derived scripts work. Should HarfBuzz really have separate shapers for Tai Tham vs. Myanmar vs. New Tai Lue vs. Khmer? Or can these be reasonably combined as suggested here? Or, going even further, can the "reordering" code be generalized to handle all 19 Brahmic-derived scripts shown in Table 4-4, while other script-specific aspects of shaping are broken out elsewhere? Best - Ed > Yours, > Martin > _______________________________________________ HarfBuzz mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/harfbuzz
