OK, thanks. I can help out if you've got questions on the python code... it's rather straightforward: it just iterates over each set of params to test, writes an alg file, runs it, opens the resulting output & parses it for the best run, confirms both single & multi PQ gave precisely the same doc IDs, and prints the results.
It's remotely possible the difference in the results is a bug/overhead in contrib/benchmark itself, which'd be good to get to the bottom of anyway. Mike On Tue, Oct 20, 2009 at 9:17 PM, John Wang <john.w...@gmail.com> wrote: > Hi Mike: > That's weird. Let me take a look at the patch. Need to brush up on > python though :) > Thanks > -John > > On Tue, Oct 20, 2009 at 10:25 AM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> >> OK I posted a patch that folds the MultiPQ approach into >> contrib/benchmark, plus a simple python wrapper to run old/new tests >> across different queries, sort, topN, etc. >> >> But I got different results... MultiPQ looks generally slower than >> SinglePQ. So I think we now need to reconcile what's different >> between our tests. >> >> Mike >> >> On Mon, Oct 19, 2009 at 9:28 PM, John Wang <john.w...@gmail.com> wrote: >> > Hi Michael: >> > Was wondering if you got a chance to take a look at this. >> > Since deprecated APIs are being removed in 3.0, I was wondering >> > if/when >> > we would decide on keeping the ScoreDocComparator API and thus would be >> > kept >> > for Lucene 3.0. >> > Thanks >> > -John >> > >> > On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless >> > <luc...@mikemccandless.com> wrote: >> >> >> >> Oh, no problem... >> >> >> >> Mike >> >> >> >> On Fri, Oct 16, 2009 at 12:33 PM, John Wang <john.w...@gmail.com> >> >> wrote: >> >> > Mike, just a clarification on my first perf report email. >> >> > The first section, numHits is incorrectly labeled, it should be 20 >> >> > instead >> >> > of 50. Sorry about the possible confusion. >> >> > Thanks >> >> > -John >> >> > >> >> > On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless >> >> > <luc...@mikemccandless.com> wrote: >> >> >> >> >> >> Thanks John; I'll have a look. >> >> >> >> >> >> Mike >> >> >> >> >> >> On Fri, Oct 16, 2009 at 12:57 AM, John Wang <john.w...@gmail.com> >> >> >> wrote: >> >> >> > Hi Michael: >> >> >> > I added classes: ScoreDocComparatorQueue >> >> >> > and OneSortNoScoreCollector >> >> >> > as >> >> >> > a more general case. I think keeping the old api for >> >> >> > ScoreDocComparator >> >> >> > and >> >> >> > SortComparatorSource would work. >> >> >> > Please take a look. >> >> >> > Thanks >> >> >> > -John >> >> >> > >> >> >> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang <john.w...@gmail.com> >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi Michael: >> >> >> >> It is >> >> >> >> open, http://code.google.com/p/lucene-book/source/checkout >> >> >> >> I think I sent the https url instead, sorry. >> >> >> >> The multi PQ sorting is fairly self-contained, I have 2 >> >> >> >> versions, 1 >> >> >> >> for string and 1 for int, each are Collector impls. >> >> >> >> I shouldn't say the Multi Q is faster on int sort, it is >> >> >> >> within >> >> >> >> the >> >> >> >> error boundary. The diff is very very small, I would stay they >> >> >> >> are >> >> >> >> more >> >> >> >> equal. >> >> >> >> If you think it is a good thing to go this way, (if not for >> >> >> >> the >> >> >> >> perf, >> >> >> >> just for the simpler api) I'd be happy to work on a patch. >> >> >> >> Thanks >> >> >> >> -John >> >> >> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless >> >> >> >> <luc...@mikemccandless.com> wrote: >> >> >> >>> >> >> >> >>> John, looks like this requires login -- any plans to open that >> >> >> >>> up, >> >> >> >>> or, >> >> >> >>> post the code on an issue? >> >> >> >>> >> >> >> >>> How self-contained is your Multi PQ sorting? EG is it a >> >> >> >>> standalone >> >> >> >>> Collector impl that I can test? >> >> >> >>> >> >> >> >>> Mike >> >> >> >>> >> >> >> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <john.w...@gmail.com> >> >> >> >>> wrote: >> >> >> >>> > BTW, we are have a little sandbox for these experiments. And >> >> >> >>> > all >> >> >> >>> > my >> >> >> >>> > testcode >> >> >> >>> > are at. They are not very polished. >> >> >> >>> > >> >> >> >>> > https://lucene-book.googlecode.com/svn/trunk >> >> >> >>> > >> >> >> >>> > -John >> >> >> >>> > >> >> >> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang >> >> >> >>> > <john.w...@gmail.com> >> >> >> >>> > wrote: >> >> >> >>> >> >> >> >> >>> >> Numbers Mike requested for Int types: >> >> >> >>> >> >> >> >> >>> >> only the time/cputime are posted, others are all the same >> >> >> >>> >> since >> >> >> >>> >> the >> >> >> >>> >> algorithm is the same. >> >> >> >>> >> >> >> >> >>> >> Lucene 2.9: >> >> >> >>> >> numhits: 10 >> >> >> >>> >> time: 14619495 >> >> >> >>> >> cpu: 146126 >> >> >> >>> >> >> >> >> >>> >> numhits: 20 >> >> >> >>> >> time: 14550568 >> >> >> >>> >> cpu: 163242 >> >> >> >>> >> >> >> >> >>> >> numhits: 100 >> >> >> >>> >> time: 16467647 >> >> >> >>> >> cpu: 178379 >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> my test: >> >> >> >>> >> numHits: 10 >> >> >> >>> >> time: 14101094 >> >> >> >>> >> cpu: 144715 >> >> >> >>> >> >> >> >> >>> >> numHits: 20 >> >> >> >>> >> time: 14804821 >> >> >> >>> >> cpu: 151305 >> >> >> >>> >> >> >> >> >>> >> numHits: 100 >> >> >> >>> >> time: 15372157 >> >> >> >>> >> cpu time: 158842 >> >> >> >>> >> >> >> >> >>> >> Conclusions: >> >> >> >>> >> The are very similar, the differences are all within error >> >> >> >>> >> bounds, >> >> >> >>> >> especially with lower PQ sizes, which second sort alg again >> >> >> >>> >> slightly >> >> >> >>> >> faster. >> >> >> >>> >> >> >> >> >>> >> Hope this helps. >> >> >> >>> >> >> >> >> >>> >> -John >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley >> >> >> >>> >> <yo...@lucidimagination.com> >> >> >> >>> >> wrote: >> >> >> >>> >>> >> >> >> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless >> >> >> >>> >>> <luc...@mikemccandless.com> wrote: >> >> >> >>> >>> > Though it'd be odd if the switch to searching by segment >> >> >> >>> >>> > really was most of the gains here. >> >> >> >>> >>> >> >> >> >>> >>> I had assumed that much of the improvement was due to >> >> >> >>> >>> ditching >> >> >> >>> >>> MultiTermEnum/MultiTermDocs. >> >> >> >>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that >> >> >> >>> >>> only >> >> >> >>> >>> helps >> >> >> >>> >>> with queries that use a TermEnum (range, prefix, etc). >> >> >> >>> >>> >> >> >> >>> >>> -Yonik >> >> >> >>> >>> http://www.lucidimagination.com >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> --------------------------------------------------------------------- >> >> >> >>> >>> To unsubscribe, e-mail: >> >> >> >>> >>> java-dev-unsubscr...@lucene.apache.org >> >> >> >>> >>> For additional commands, e-mail: >> >> >> >>> >>> java-dev-h...@lucene.apache.org >> >> >> >>> >>> >> >> >> >>> >> >> >> >> >>> > >> >> >> >>> > >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> --------------------------------------------------------------------- >> >> >> >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> >> >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> >>> >> >> >> >> >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> >> >> > >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org