On May 18, 2009, at 11:31 PM, Robert Muir wrote:

I am curious about this, do you think its a better default because it avoids the max boolean clauses problem? or because for a lot of these scoring doesn't make much sense anyway?

I ran tests on a pretty big index, you pay a price for the constant score/filter method. Its slower for the common case searches, it only starts to win for queries that return > 10% or so the index, but its significantly slower for narrow queries...

I'm just trying to imagine a case where queries that return > 10% or so of the index are actually the common/default...?

It is common in my application, a Bible program, that indexes each verse (think of a verse as a numbered sentence) as a separate document. We index everything, including words that are typically stop words as those might be important to our end users. Besides this, the top 280 word roots represent 90% of the occurrences.

And on searches, we return everything in book order, unless the user wants to score the result. In that case, we return a small, user configurable amount of hits ordered by score.

And we are using Lucene out of the box for the most part. We've deviated only to incrementally solve performance problems.





 * Constant score rewrite ought to be the default for most multi-term
   queries




--
Robert Muir
rcm...@gmail.com

Reply via email to