Re: Multi language indexing

karl wettin Mon, 07 May 2007 04:06:42 -0700


7 maj 2007 kl. 12.16 skrev bhecht:

My question regarding "the way to go", was if it is a good solutionto indexa content of a table, using more than 1 analyzer, determining theanalyzer
by the country value of each record.


I'm not sure what you mean, but I'll try.

Do you ask if it makes sense to stem text based on the language ofthe text and put in the same field no matter what language it is?

For the record, it usually makes very little sense to search in textstemmed for one language with a query stemmed for another language.This is what you will do if you store the stemmed text, no matter thelanguage, in the same field. You could add another field called"language_iso" and add a boolean clause, but that would just beoverkill and will increase the response time.

In essence, it depends on your needs. For instance, are userssupposed to find documents written in other languages than thelanguage specified? You want to limit searches to a content language?

My guess is that you probably want to index unstemmed in"unstemmed_text" and stemmed in a language specific field"stemmed_text_[language iso]", or so, querying the unstemmed fieldand the user language specific when searching, boosting the stemmedfield.


I hope this helps.

--
karl


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Multi language indexing

Reply via email to