On Tue, May 17, 2005 at 04:05:34PM -0700, Gregory K. Ruiz-Ade wrote: > The part that's nailing me is the letters with diacritics.
Well, I know how I could do this in Python, though I imagine it's just a general unicode thing, not specific to Python, and I can't imagine Perl can't do it. Unicode characters have something called a canonical decomposition. For example, o with an accent (unicode \xf3, which I can't type in jed since it doesn't support unicode :( ), the decomposition is 'o' then \u301, the accent character. So you could get the canonical decomposition of the character and just throw away the second part. In Python: >>> from unicodedata import decomposition >>> c = u'\xf3' >>> print c รณ >>> decomp = decomposition(c) >>> print decomp 006F 0301 >>> parts = decomp.split() >>> ordval = int(parts[0], 16) >>> char = chr(ordval) >>> print char o Dave Cook -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg
