Don't have experience with those particular languages, but I can tell you that dealing 
with UNICODE is just a matter of making sure you read in the input using the correct 
encoding.  Java will take care of the rest.  If you are using a Reader for your Field, 
you probably have to do something like:

new InputStreamReader(new FileInputStream(file), "UTF-8")

assuming your files are stored in UTF-8.  If they are a different encoding, then you 
will have to pass that in place of UTF-8.

I would do a google search for stemmers and tokenizers for the languages you are 
interested in.  I also believe someone had a "generic" stemmer that performed very 
well.  I believe they posted to this list a week or so ago w/ a topic of "Writing a 
stemmer" or something along those lines.

>>> [EMAIL PROTECTED] 06/10/04 01:34AM >>>

Any one have built lucene for Devnagari UNICODE search? PLZ help me wht 
kind of changes i have to do in lucene.

Also if any one have built StandardTokenizer,Analyzer,Stemmer,Indexer
,queryParser for Hindi & Marathi Plz let me know.

Thanks,
Satish.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to