OK I posted a patch that folds the MultiPQ approach into contrib/benchmark, plus a simple python wrapper to run old/new tests across different queries, sort, topN, etc.
But I got different results... MultiPQ looks generally slower than SinglePQ. So I think we now need to reconcile what's different between our tests. Mike On Mon, Oct 19, 2009 at 9:28 PM, John Wang <john.w...@gmail.com> wrote: > Hi Michael: > Was wondering if you got a chance to take a look at this. > Since deprecated APIs are being removed in 3.0, I was wondering if/when > we would decide on keeping the ScoreDocComparator API and thus would be kept > for Lucene 3.0. > Thanks > -John > > On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> >> Oh, no problem... >> >> Mike >> >> On Fri, Oct 16, 2009 at 12:33 PM, John Wang <john.w...@gmail.com> wrote: >> > Mike, just a clarification on my first perf report email. >> > The first section, numHits is incorrectly labeled, it should be 20 >> > instead >> > of 50. Sorry about the possible confusion. >> > Thanks >> > -John >> > >> > On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless >> > <luc...@mikemccandless.com> wrote: >> >> >> >> Thanks John; I'll have a look. >> >> >> >> Mike >> >> >> >> On Fri, Oct 16, 2009 at 12:57 AM, John Wang <john.w...@gmail.com> >> >> wrote: >> >> > Hi Michael: >> >> > I added classes: ScoreDocComparatorQueue >> >> > and OneSortNoScoreCollector >> >> > as >> >> > a more general case. I think keeping the old api for >> >> > ScoreDocComparator >> >> > and >> >> > SortComparatorSource would work. >> >> > Please take a look. >> >> > Thanks >> >> > -John >> >> > >> >> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang <john.w...@gmail.com> >> >> > wrote: >> >> >> >> >> >> Hi Michael: >> >> >> It is >> >> >> open, http://code.google.com/p/lucene-book/source/checkout >> >> >> I think I sent the https url instead, sorry. >> >> >> The multi PQ sorting is fairly self-contained, I have 2 >> >> >> versions, 1 >> >> >> for string and 1 for int, each are Collector impls. >> >> >> I shouldn't say the Multi Q is faster on int sort, it is within >> >> >> the >> >> >> error boundary. The diff is very very small, I would stay they are >> >> >> more >> >> >> equal. >> >> >> If you think it is a good thing to go this way, (if not for the >> >> >> perf, >> >> >> just for the simpler api) I'd be happy to work on a patch. >> >> >> Thanks >> >> >> -John >> >> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless >> >> >> <luc...@mikemccandless.com> wrote: >> >> >>> >> >> >>> John, looks like this requires login -- any plans to open that up, >> >> >>> or, >> >> >>> post the code on an issue? >> >> >>> >> >> >>> How self-contained is your Multi PQ sorting? EG is it a standalone >> >> >>> Collector impl that I can test? >> >> >>> >> >> >>> Mike >> >> >>> >> >> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <john.w...@gmail.com> >> >> >>> wrote: >> >> >>> > BTW, we are have a little sandbox for these experiments. And all >> >> >>> > my >> >> >>> > testcode >> >> >>> > are at. They are not very polished. >> >> >>> > >> >> >>> > https://lucene-book.googlecode.com/svn/trunk >> >> >>> > >> >> >>> > -John >> >> >>> > >> >> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <john.w...@gmail.com> >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Numbers Mike requested for Int types: >> >> >>> >> >> >> >>> >> only the time/cputime are posted, others are all the same since >> >> >>> >> the >> >> >>> >> algorithm is the same. >> >> >>> >> >> >> >>> >> Lucene 2.9: >> >> >>> >> numhits: 10 >> >> >>> >> time: 14619495 >> >> >>> >> cpu: 146126 >> >> >>> >> >> >> >>> >> numhits: 20 >> >> >>> >> time: 14550568 >> >> >>> >> cpu: 163242 >> >> >>> >> >> >> >>> >> numhits: 100 >> >> >>> >> time: 16467647 >> >> >>> >> cpu: 178379 >> >> >>> >> >> >> >>> >> >> >> >>> >> my test: >> >> >>> >> numHits: 10 >> >> >>> >> time: 14101094 >> >> >>> >> cpu: 144715 >> >> >>> >> >> >> >>> >> numHits: 20 >> >> >>> >> time: 14804821 >> >> >>> >> cpu: 151305 >> >> >>> >> >> >> >>> >> numHits: 100 >> >> >>> >> time: 15372157 >> >> >>> >> cpu time: 158842 >> >> >>> >> >> >> >>> >> Conclusions: >> >> >>> >> The are very similar, the differences are all within error >> >> >>> >> bounds, >> >> >>> >> especially with lower PQ sizes, which second sort alg again >> >> >>> >> slightly >> >> >>> >> faster. >> >> >>> >> >> >> >>> >> Hope this helps. >> >> >>> >> >> >> >>> >> -John >> >> >>> >> >> >> >>> >> >> >> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley >> >> >>> >> <yo...@lucidimagination.com> >> >> >>> >> wrote: >> >> >>> >>> >> >> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless >> >> >>> >>> <luc...@mikemccandless.com> wrote: >> >> >>> >>> > Though it'd be odd if the switch to searching by segment >> >> >>> >>> > really was most of the gains here. >> >> >>> >>> >> >> >>> >>> I had assumed that much of the improvement was due to ditching >> >> >>> >>> MultiTermEnum/MultiTermDocs. >> >> >>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only >> >> >>> >>> helps >> >> >>> >>> with queries that use a TermEnum (range, prefix, etc). >> >> >>> >>> >> >> >>> >>> -Yonik >> >> >>> >>> http://www.lucidimagination.com >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> --------------------------------------------------------------------- >> >> >>> >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> >>> >>> For additional commands, e-mail: >> >> >>> >>> java-dev-h...@lucene.apache.org >> >> >>> >>> >> >> >>> >> >> >> >>> > >> >> >>> > >> >> >>> >> >> >>> >> >> >>> --------------------------------------------------------------------- >> >> >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >>> >> >> >> >> >> > >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org