Hello all, I suspect my answer will involve unicode, but I'd like to make sure that I am going down the right path here.
I have 100,000+ small HTML files that are mainly in the english language. I just noticed that we have some user names with umlauts. These are seemingly stored and searchable as the '?' character. My code is based on the demo code that is provided with Lucene, under the 'demo' directory. I am wondering what changes I will need to make to handle such characters as umlauts within english text ? Thanks IAP _________________________________________________________________ Join the world�s largest e-mail service with MSN Hotmail. http://www.hotmail.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
