On May 18, 2009, at 11:31 PM, Robert Muir wrote:
I am curious about this, do you think its a better default because
it avoids the max boolean clauses problem? or because for a lot of
these scoring doesn't make much sense anyway?
I ran tests on a pretty big index, you pay a price for the constant
score/filter method. Its slower for the common case searches, it
only starts to win for queries that return > 10% or so the index,
but its significantly slower for narrow queries...
I'm just trying to imagine a case where queries that return > 10% or
so of the index are actually the common/default...?
It is common in my application, a Bible program, that indexes each
verse (think of a verse as a numbered sentence) as a separate
document. We index everything, including words that are typically stop
words as those might be important to our end users. Besides this, the
top 280 word roots represent 90% of the occurrences.
And on searches, we return everything in book order, unless the user
wants to score the result. In that case, we return a small, user
configurable amount of hits ordered by score.
And we are using Lucene out of the box for the most part. We've
deviated only to incrementally solve performance problems.
* Constant score rewrite ought to be the default for most multi-term
queries
--
Robert Muir
rcm...@gmail.com