Don't have experience with those particular languages, but I can tell you that dealing with UNICODE is just a matter of making sure you read in the input using the correct encoding. Java will take care of the rest. If you are using a Reader for your Field, you probably have to do something like:
new InputStreamReader(new FileInputStream(file), "UTF-8") assuming your files are stored in UTF-8. If they are a different encoding, then you will have to pass that in place of UTF-8. I would do a google search for stemmers and tokenizers for the languages you are interested in. I also believe someone had a "generic" stemmer that performed very well. I believe they posted to this list a week or so ago w/ a topic of "Writing a stemmer" or something along those lines. >>> [EMAIL PROTECTED] 06/10/04 01:34AM >>> Any one have built lucene for Devnagari UNICODE search? PLZ help me wht kind of changes i have to do in lucene. Also if any one have built StandardTokenizer,Analyzer,Stemmer,Indexer ,queryParser for Hindi & Marathi Plz let me know. Thanks, Satish. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
