DoesLucene StandardAnalyzer work for all the languagues for tokenizing before
indexing (since we are using java, I think the content is converted to UTF-8
before tokenizing/indeing)? or do we need to use special analyzers for each
of the language.  In this case, if a document has a mixed case ( english +
Japanese), what analyzer should we use and how can we figure it out
dynamically before indexing?

Also, while searching if the query text contains (both english and
Japanese), how does this work? Any criteria in choosing the analyzers?

Thanks,
Sai



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-StandardAnalyzer-good-enough-for-multi-languages-tp4031660.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to