I'm creating some hyphenation rules for Jarai texts that I'm interlinearizing. Here's the problem: In various texts, a complex character such as LATIN SMALL LETTER A WITH BREVE might be encoded as a single code point (U+0103) or as a combination of code points (LATIN SMALL LETTER A: U+0061 plus COMBINING BREVE: U+0306). The \hyphenation{} command does not treat the two things as the same, meaning that I have to create two versions of a word if it has one accented character, four versions if it has two accented character, nine versions if it has three, etc. For example:
\hyphenation{hơ-nuă hơ-nuă hơ-nuă hơ-nuă} (because O WITH HORN can be two code points or one) Is there a simple way to tell (Xe)LaTeX to treat precomposed and uncomposed characters identically without having to put in all the possibilities?
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex