I know this has been discussed several times, but sure don't remember the answers. Search the mail archive for "multiple languages" and you'll find some good suggestions. But as I remember, it's not a trivial issue.
But I don't see why the "three different documents" approach wouldn't work. You could also index the same text in three different fields in a single document, using different language analyzers for each (See PerFieldAnalyzerWrapper)..... Erick On 2/22/07, Ivan Vasilev <[EMAIL PROTECTED]> wrote:
Hi All, Our application that uses Lucene for indexing will be used to index documents that each of which contains parts written in different languages. For example some document could contain English, Chinese and Brazilian text. So how to index such document? Is there some best practice to do this? What comes in my mind is to index 3 different Lucene Documents for the real document and keep in a database the meta info that these 3 Documents are related to our real doc. For example for the myDoc.doc we will have in the index myDocEn.doc, myDocCn.doc and myDocBr.doc and when making search when the searched word is found in myDocCn.doc we will visualize to user myDoc.doc. Disadvantage here is that in this case the occurrences of the searched item will have to be recalculated. It is important for queries like "Red NEAR/10 fox". So if someone knows better practice than this, please let me help. Tanks in advance, Ivan --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]