IBM's ICU4J has a normalizer which should do what you need. It's a big
library, but if you deal with multilingual text often, it might make your
life easier.


-- 
Alex Murzaku
___________________________________________
 alex(at)lissus.com  http://www.lissus.com            

-----Original Message-----
From: stephane vaucher [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, December 10, 2002 2:58 PM
To: [EMAIL PROTECTED]
Subject: Accentuated characters


Hello everyone,

I wish to implement a TokenFilter that will remove accentuated 
characters so for example '�' will become 'e'. As I would rather not 
reinvent the wheel, I've tried to find something on the web and on the 
mailing lists. I saw a mention of a contrib that could do this (see 
http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg02146.html),

but I don't see anything applicable.

Has anyone done this yet, if so I would much appreciate some pointers 
(or code), otherwise, I'll be happy to contribute whatever I produce 
(but it might be very simple since I'll only need to deal with french).

Cheers,
Stephane


--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

<<attachment: winmail.dat>>

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to