in my tests the problem seemed to boil down to iteration of a sparse openbitset... so maybe the filter approach is still an option but when # docs is small some other doc id set impl is used?
On Tue, May 19, 2009 at 8:28 AM, Mark Miller <markrmil...@gmail.com> wrote: > Michael McCandless wrote: > >> On Mon, May 18, 2009 at 11:31 PM, Robert Muir <rcm...@gmail.com> wrote: >> >> >>> I am curious about this, do you think its a better default because it >>> avoids >>> the max boolean clauses problem? or because for a lot of these scoring >>> doesn't make much sense anyway? >>> >>> >> >> I think you're referring to constant score mode default, for >> MultiTermQuery & QueryParser, right? >> >> >> >>> I ran tests on a pretty big index, you pay a price for the constant >>> score/filter method. Its slower for the common case searches, it only >>> starts >>> to win for queries that return > 10% or so the index, but its >>> significantly >>> slower for narrow queries... >>> >>> I'm just trying to imagine a case where queries that return > 10% or so >>> of >>> the index are actually the common/default...? >>> >>> >> >> Excellent points! And this also makes clear why healthy discussion on >> each default is important, as well as how good it'd be to have >> Settings online so that we are free to even have such discussions >> (vs being bound by back-compat which prevents any improvements >> to the defaults). >> >> I was actually referring to the fact that scores for MultiTermQuery >> rewritten to BooleanQuery are often meaningless to the app (I >> think?). But you're right the performance cost of the "make a filter >> up front" approach is too high for smallish queries. >> >> Thinking more on this... I'd love to have a constant-score mode, but >> implemented as a BooleanQuery, meaning the scores would be the same >> (constant) regardless of whether under-the-hood the query was >> rewritten to BooleanQuery vs pre-compiled up front into a BitSet. >> >> This would then decouple scoring from rewrite method, which in turn >> would give us the freedom to pick and choose the fastest impl based on >> particulars of the query. >> >> So if we had such a ConstantScoreBooleanQuery, and we fixed >> MultiTermQuery to conditionally use that, then I think we'd want >> MultiTermQuery to do constant scoring by default. (And, it'd then be >> free pick whether "create filter up front" or "use >> ConstantScoreBooleanQuery" was most performant, query by query). >> >> Mike >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> > +1. ConstantScoreQuery is only a performance win when there are lots of > matches (it seems), but the lack of TooManyClauses exceptions is also a big > win. I want the best of both worlds :) > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com