7 maj 2007 kl. 12.16 skrev bhecht:

My question regarding "the way to go", was if it is a good solution to index a content of a table, using more than 1 analyzer, determining the analyzer
by the country value of each record.

I'm not sure what you mean, but I'll try.

Do you ask if it makes sense to stem text based on the language of the text and put in the same field no matter what language it is?

For the record, it usually makes very little sense to search in text stemmed for one language with a query stemmed for another language. This is what you will do if you store the stemmed text, no matter the language, in the same field. You could add another field called "language_iso" and add a boolean clause, but that would just be overkill and will increase the response time.

In essence, it depends on your needs. For instance, are users supposed to find documents written in other languages than the language specified? You want to limit searches to a content language?

My guess is that you probably want to index unstemmed in "unstemmed_text" and stemmed in a language specific field "stemmed_text_[language iso]", or so, querying the unstemmed field and the user language specific when searching, boosting the stemmed field.

I hope this helps.

--
karl


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to