Unicode normalization was discussed on this list a couple of months ago. Phil Taylor provided a small program to do the job, and other utilities were referred to. There's also a command within XeTeX that normalizes unicode before passing it to TeX's digestion. Try this in your header:
% Normalize any residual Unicode combining accents, % and write out error messages, if any: \XeTeXinputnormalization=1 \tracinglostchars=1 \tracingonline=1 Dominik On 8 July 2011 22:50, Joshua and Amy <josh.ruth...@gmail.com> wrote: > So, I guess I was foolish to hope that Google has figured out how to return > results that have non-identical but equivalent strings? > > I hope it's not too off-topic for this list, but can you point me to any > good resources on normalization (is there a straightforward automation for > someone who doesn't do scripting? am I supposed to use decomposed > characters?)? > > Thanks. > > Josh > > > On Fri, Jul 8, 2011 at 3:11 PM, maxwell <maxw...@umiacs.umd.edu> wrote: > >> On Fri, 8 Jul 2011 15:00:42 -0500, Joshua and Amy <josh.ruth...@gmail.com >> > >> wrote: >> > I'm creating some hyphenation rules for Jarai texts that I'm >> > interlinearizing. Here's the problem: In various texts, a complex >> character >> > such as LATIN SMALL LETTER A WITH BREVE might be encoded as a single >> code >> > point (U+0103) or as a combination of code points (LATIN SMALL LETTER A: >> > U+0061 plus COMBINING BREVE: U+0306). >> >> Can't (shouldn't!) you pass your texts through a Unicode normalization >> process? Otherwise search on them might not work either, depending on how >> smart your search tool is. >> >> Mike Maxwell >> >> >> -------------------------------------------------- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex >> > > > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > >
-------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex