Thanks for the link. I've looked at it and it has some interesting parts like the stop words and the analyser which I might partially include (partially since I work with both english and french texts).
Cheers,
Stephane
Eric Isakson wrote:
Don't know if any of the code in this French analyzer that was contributed by Patrick Talbot may apply, any reason you don't just use it? see http://nagoya.apache.org/eyebrowse/ReadMsg?[EMAIL PROTECTED]&msgNo=870
Eric
--
Eric D. Isakson SAS Institute Inc.
Application Developer SAS Campus Drive
XML Technologies Cary, NC 27513
(919) 531-3639 http://www.sas.com
-----Original Message-----
From: stephane vaucher [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, December 10, 2002 2:58 PM
To: [EMAIL PROTECTED]
Subject: Accentuated characters
Hello everyone,
I wish to implement a TokenFilter that will remove accentuated characters so for example '�' will become 'e'. As I would rather not reinvent the wheel, I've tried to find something on the web and on the mailing lists. I saw a mention of a contrib that could do this (see http://www.mail-archive.com/lucene-user%40jakarta.apache.org/msg02146.html), but I don't see anything applicable.
Has anyone done this yet, if so I would much appreciate some pointers (or code), otherwise, I'll be happy to contribute whatever I produce (but it might be very simple since I'll only need to deal with french).
Cheers,
Stephane
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
