On Tue, May 17, 2005 at 04:05:34PM -0700, Gregory K. Ruiz-Ade wrote:

> The part that's nailing me is the letters with diacritics.

Well, I know how I could do this in Python, though I imagine it's just a
general unicode thing, not specific to Python, and I can't imagine Perl
can't do it.  Unicode characters have something called a canonical
decomposition.  For example, o with an accent (unicode \xf3, which I can't
type in jed since it doesn't support unicode :( ), the decomposition is 'o'
then \u301, the accent character.  So you could get the canonical
decomposition of the character and just throw away the second part.  In
Python:

>>> from unicodedata import decomposition
>>> c = u'\xf3'
>>> print c
รณ
>>> decomp = decomposition(c)
>>> print decomp
006F 0301
>>> parts = decomp.split()
>>> ordval = int(parts[0], 16)
>>> char = chr(ordval)
>>> print char
o

Dave Cook

-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to