Re: DisjunctionScorer performance

John Wang Tue, 06 Jan 2009 23:58:27 -0800

One more thing I missed. I don't quite get your point about skip() vs
next().


With or queries, skipping does not help as much comparing to and queries.

-John

On Tue, Jan 6, 2009 at 11:55 PM, John Wang <[email protected]> wrote:

> Paul:
>
>        Our very simple/naive testing methodology for OrDocIdSetIterator:
>
> 5 sub iterators, each subiterators just iterate from 0 to 1,000,000.
>
> The test iterates the OrDocIdSetIterator until next() is false.
>
>       Do you want me to run the same test against DisjunctDisi?
>
> -John
>
>
> On Tue, Jan 6, 2009 at 11:48 PM, Paul Elschot <[email protected]>wrote:
>
>>  On Wednesday 07 January 2009 07:36:06 John Wang wrote:
>>
>> > Hi guys:
>>
>> >
>>
>> > We have been building a suite of boolean operators DocIdSets
>>
>> > (e.g. AndDocIdSet/Iterator, OrDocIdSet/Iterator,
>>
>> > NotDocIdSet/Iterator). We compared our implementation on the
>>
>> > OrDocIdSetIterator (based on DisjunctionMaxScorer code) with some
>>
>> > code tuning, and we see the performance doubled in our testing.
>>
>> That's good news.
>>
>> What data structure did you use for sorting by doc id?
>>
>> Currently a priority queue is used for that, and normally that is
>>
>> the bottleneck for performance.
>>
>> > (we
>>
>> > haven't done comparisons with ConjuctionScorer vs.
>>
>> > AndDocIdSetIterator, will post numbers when we do)
>>
>> >
>>
>> > We'd be happy to contribute this back to the community. But what
>>
>> > is the best way of going about it?
>>
>> >
>>
>> > option 1: merge our change into DisjunctionMax/SumScorers.
>>
>> > option 2: contribute boolean operator sets, and have
>>
>> > DisjunctionScorers derive from OrDocIdSetIterator, ConjunctionScorer
>>
>> > derive from AndDocIdSetIterator etc.
>>
>> >
>>
>> > Option 2 seems to be cleaner. Thoughts?
>>
>> Some theoretical performance improvement is possible when the
>>
>> minimum number of required scorers/iterators is higher than 1,
>>
>> by using of skipTo() (as much as possible) instead of next() in
>>
>> such cases. For the moment that's theoretical because there
>>
>> is no working implementation of this yet, but have a look at
>>
>> LUCENE-1345 .
>>
>> I'm currently working on a DisjunctionDISI, probably the same function as
>> the OrDocIdSetIterator you mentioned above. In case you have
>>
>> something faster than that, could you post it at LUCENE-1345 or at a
>>
>> new issue?
>>
>> An AndDocIdSetIterator could also be useful for the PhraseScorers and
>>
>> for the SpanNear queries, but that is of later concern.
>>
>> So I'd prefer option 2.
>>
>> Regards,
>>
>> Paul Elschot
>>
>>
>

Re: DisjunctionScorer performance

Reply via email to