So, I guess I was foolish to hope that Google has figured out how to return results that have non-identical but equivalent strings?
I hope it's not too off-topic for this list, but can you point me to any good resources on normalization (is there a straightforward automation for someone who doesn't do scripting? am I supposed to use decomposed characters?)? Thanks. Josh On Fri, Jul 8, 2011 at 3:11 PM, maxwell <maxw...@umiacs.umd.edu> wrote: > On Fri, 8 Jul 2011 15:00:42 -0500, Joshua and Amy <josh.ruth...@gmail.com> > wrote: > > I'm creating some hyphenation rules for Jarai texts that I'm > > interlinearizing. Here's the problem: In various texts, a complex > character > > such as LATIN SMALL LETTER A WITH BREVE might be encoded as a single > code > > point (U+0103) or as a combination of code points (LATIN SMALL LETTER A: > > U+0061 plus COMBINING BREVE: U+0306). > > Can't (shouldn't!) you pass your texts through a Unicode normalization > process? Otherwise search on them might not work either, depending on how > smart your search tool is. > > Mike Maxwell > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex >
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex