One more thing I missed. I don't quite get your point about skip() vs next().
With or queries, skipping does not help as much comparing to and queries. -John On Tue, Jan 6, 2009 at 11:55 PM, John Wang <[email protected]> wrote: > Paul: > > Our very simple/naive testing methodology for OrDocIdSetIterator: > > 5 sub iterators, each subiterators just iterate from 0 to 1,000,000. > > The test iterates the OrDocIdSetIterator until next() is false. > > Do you want me to run the same test against DisjunctDisi? > > -John > > > On Tue, Jan 6, 2009 at 11:48 PM, Paul Elschot <[email protected]>wrote: > >> On Wednesday 07 January 2009 07:36:06 John Wang wrote: >> >> > Hi guys: >> >> > >> >> > We have been building a suite of boolean operators DocIdSets >> >> > (e.g. AndDocIdSet/Iterator, OrDocIdSet/Iterator, >> >> > NotDocIdSet/Iterator). We compared our implementation on the >> >> > OrDocIdSetIterator (based on DisjunctionMaxScorer code) with some >> >> > code tuning, and we see the performance doubled in our testing. >> >> That's good news. >> >> What data structure did you use for sorting by doc id? >> >> Currently a priority queue is used for that, and normally that is >> >> the bottleneck for performance. >> >> > (we >> >> > haven't done comparisons with ConjuctionScorer vs. >> >> > AndDocIdSetIterator, will post numbers when we do) >> >> > >> >> > We'd be happy to contribute this back to the community. But what >> >> > is the best way of going about it? >> >> > >> >> > option 1: merge our change into DisjunctionMax/SumScorers. >> >> > option 2: contribute boolean operator sets, and have >> >> > DisjunctionScorers derive from OrDocIdSetIterator, ConjunctionScorer >> >> > derive from AndDocIdSetIterator etc. >> >> > >> >> > Option 2 seems to be cleaner. Thoughts? >> >> Some theoretical performance improvement is possible when the >> >> minimum number of required scorers/iterators is higher than 1, >> >> by using of skipTo() (as much as possible) instead of next() in >> >> such cases. For the moment that's theoretical because there >> >> is no working implementation of this yet, but have a look at >> >> LUCENE-1345 . >> >> I'm currently working on a DisjunctionDISI, probably the same function as >> the OrDocIdSetIterator you mentioned above. In case you have >> >> something faster than that, could you post it at LUCENE-1345 or at a >> >> new issue? >> >> An AndDocIdSetIterator could also be useful for the PhraseScorers and >> >> for the SpanNear queries, but that is of later concern. >> >> So I'd prefer option 2. >> >> Regards, >> >> Paul Elschot >> >> >
