Heikki, it does solve your main concern: a term in lucene is a pair of a token and field name. The term frequency is, thus, the frequency of a token in a field.
So the term-frequency of text-stemmed-de:firewall is independent of the term-frequency of text-stemmed-en:firewall (for example). But using the query expansion mechanism, it is likely that both term-queries will be present and both contribute to the score. Which is correct I think. paul Le 3 janv. 2012 à 15:06, heikki a écrit : > >> The important bit is to use query-expansion. >> Given a query of the user (with params or not, with text-queries), expand >> it to a query where the "normal text" is expected to be in the right >> language, but maybe also in one of the other languages (that >> the browser says, that your platform supports), with less weight of > course. > > something like that we do now in a single index solution - results in the > requested language are boosted enough so they're always on top > > I don't think though that this addresses what is my main point: the > frequency of terms in different domains (in this case, different languages) > is different for each domain. This means that if the domains are chunked > together in one index, the IDF value for a term is less "accurate" than if > multiple, separate indexes were used. A term is more or less frequent in > one domain or another, for a reason.. Relevance ranking is impacted by > that, and is more accurate if separate indexes are used -- I think this > seems logical. > > I just don't know how much impact it really has, and whether it is worth to > deal with it by presenting separate result sets from separate index > searches ..