Randy Darling wrote:

Would it be ok to add an extra addDocument method to IndexWriter that would take an analyzer in addition to the document?

I am going to be indexing documents for multiple languages
and I would prefer to not have to reopen a writer for
each document that we are going to index.

I took a look at the code and it looks pretty straight forward
and it didn't look like it would break anything.

I had the same problem, but I came up with a workaround which might be helpful to you. I just wrote a facade analyzer, which selects appropriate language-specific analyzer just before I call addDocument. Something like:


SwitchLangAnalyzer sla = new SwitchLangAnalyzer(new Analyzer[] {GermanAnalyzer, RussianAnalyzer, SwedishAnalyzer});
IndexWriter iw = new IndexWriter(dir, sla, true);
// add German doc
sla.select(0);
iw.addDocument(doc);
// add Russian doc
sla.select(1);
iw.addDocument(doc);


..and so on...

You need to be extra careful though how you use such index afterwards, especially if you use stemming or stop words - I also store a "lang" field which I use to limit the search to documents only in a given language, and I use the same sub-analyzer for queries.

--
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to