Le 1 juin 05, à 01:12, Erik Hatcher a écrit :
1/ one index for all languages
2/ one index for all languages, with an extra language field so
searches
can be constrained to a particular language
3/ separate indices for each language?
I would vote for option #2 as it gives the most flexibilty - you can
query with or without concern for language.
The way I've solved this is to make a different field-name per-language
as our documents can be multilingual.
What's then done is query expansion at query time: given a term-query
for text, I duplicate it for each accepted language of the user with a
factor related to the preference of the language (e.g. the q factor in
Accept-Language http header). Presumably I could be using solution 2/
as well if my queries become too big, making several documents for each
language of the document.
I think it's very important to care about guessing the accepted
languages of the user. Typically, the default behaviour of Google is to
only give you matches in your primary language but then allow expansion
in any language.
On the other hand, if people are searching for proper nouns in
metadata
(e.g. "DSpace") it may be advantageous to search all languages at
once.
This one may need particular treatment.
Tell us your success!
paul
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]