Re: Lucene's default settings & back compatibility

Robert Muir Tue, 19 May 2009 05:51:35 -0700

in my tests the problem seemed to boil down to iteration of a sparse
openbitset... so maybe the filter approach is still an option but when #
docs is small some other doc id set impl is used?


On Tue, May 19, 2009 at 8:28 AM, Mark Miller <markrmil...@gmail.com> wrote:

> Michael McCandless wrote:
>
>> On Mon, May 18, 2009 at 11:31 PM, Robert Muir <rcm...@gmail.com> wrote:
>>
>>
>>> I am curious about this, do you think its a better default because it
>>> avoids
>>> the max boolean clauses problem? or because for a lot of these scoring
>>> doesn't make much sense anyway?
>>>
>>>
>>
>> I think you're referring to constant score mode default, for
>> MultiTermQuery & QueryParser, right?
>>
>>
>>
>>> I ran tests on a pretty big index, you pay a price for the constant
>>> score/filter method. Its slower for the common case searches, it only
>>> starts
>>> to win for queries that return > 10% or so the index, but its
>>> significantly
>>> slower for narrow queries...
>>>
>>> I'm just trying to imagine a case where queries that return > 10% or so
>>> of
>>> the index are actually the common/default...?
>>>
>>>
>>
>> Excellent points!  And this also makes clear why healthy discussion on
>> each default is important, as well as how good it'd be to have
>> Settings online so that we are free to even have such discussions
>> (vs being bound by back-compat which prevents any improvements
>> to the defaults).
>>
>> I was actually referring to the fact that scores for MultiTermQuery
>> rewritten to BooleanQuery are often meaningless to the app (I
>> think?).  But you're right the performance cost of the "make a filter
>> up front" approach is too high for smallish queries.
>>
>> Thinking more on this... I'd love to have a constant-score mode, but
>> implemented as a BooleanQuery, meaning the scores would be the same
>> (constant) regardless of whether under-the-hood the query was
>> rewritten to BooleanQuery vs pre-compiled up front into a BitSet.
>>
>> This would then decouple scoring from rewrite method, which in turn
>> would give us the freedom to pick and choose the fastest impl based on
>> particulars of the query.
>>
>> So if we had such a ConstantScoreBooleanQuery, and we fixed
>> MultiTermQuery to conditionally use that, then I think we'd want
>> MultiTermQuery to do constant scoring by default.  (And, it'd then be
>> free pick whether "create filter up front" or "use
>> ConstantScoreBooleanQuery" was most performant, query by query).
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>>
> +1. ConstantScoreQuery is only a performance win when there are lots of
> matches (it seems), but the lack of TooManyClauses exceptions is also a big
> win. I want the best of both worlds :)
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>


-- 
Robert Muir
rcm...@gmail.com

Re: Lucene's default settings & back compatibility

Reply via email to