Re: lucene 2.9 sorting algorithm

John Wang Tue, 20 Oct 2009 18:18:19 -0700

Hi Mike:
    That's weird. Let me take a look at the patch. Need to brush up on
python though :)
Thanks
-John


On Tue, Oct 20, 2009 at 10:25 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> OK I posted a patch that folds the MultiPQ approach into
> contrib/benchmark, plus a simple python wrapper to run old/new tests
> across different queries, sort, topN, etc.
>
> But I got different results... MultiPQ looks generally slower than
> SinglePQ.  So I think we now need to reconcile what's different
> between our tests.
>
> Mike
>
> On Mon, Oct 19, 2009 at 9:28 PM, John Wang <john.w...@gmail.com> wrote:
> > Hi Michael:
> >      Was wondering if you got a chance to take a look at this.
> >      Since deprecated APIs are being removed in 3.0, I was wondering
> if/when
> > we would decide on keeping the ScoreDocComparator API and thus would be
> kept
> > for Lucene 3.0.
> > Thanks
> > -John
> >
> > On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless
> > <luc...@mikemccandless.com> wrote:
> >>
> >> Oh, no problem...
> >>
> >> Mike
> >>
> >> On Fri, Oct 16, 2009 at 12:33 PM, John Wang <john.w...@gmail.com>
> wrote:
> >> > Mike, just a clarification on my first perf report email.
> >> > The first section, numHits is incorrectly labeled, it should be 20
> >> > instead
> >> > of 50. Sorry about the possible confusion.
> >> > Thanks
> >> > -John
> >> >
> >> > On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless
> >> > <luc...@mikemccandless.com> wrote:
> >> >>
> >> >> Thanks John; I'll have a look.
> >> >>
> >> >> Mike
> >> >>
> >> >> On Fri, Oct 16, 2009 at 12:57 AM, John Wang <john.w...@gmail.com>
> >> >> wrote:
> >> >> > Hi Michael:
> >> >> >     I added classes: ScoreDocComparatorQueue
> >> >> > and OneSortNoScoreCollector
> >> >> > as
> >> >> > a more general case. I think keeping the old api for
> >> >> > ScoreDocComparator
> >> >> > and
> >> >> > SortComparatorSource would work.
> >> >> >   Please take a look.
> >> >> > Thanks
> >> >> > -John
> >> >> >
> >> >> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang <john.w...@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi Michael:
> >> >> >>      It is
> >> >> >> open, http://code.google.com/p/lucene-book/source/checkout
> >> >> >>      I think I sent the https url instead, sorry.
> >> >> >>     The multi PQ sorting is fairly self-contained, I have 2
> >> >> >> versions, 1
> >> >> >> for string and 1 for int, each are Collector impls.
> >> >> >>      I shouldn't say the Multi Q is faster on int sort, it is
> within
> >> >> >> the
> >> >> >> error boundary. The diff is very very small, I would stay they are
> >> >> >> more
> >> >> >> equal.
> >> >> >>      If you think it is a good thing to go this way, (if not for
> the
> >> >> >> perf,
> >> >> >> just for the simpler api) I'd be happy to work on a patch.
> >> >> >> Thanks
> >> >> >> -John
> >> >> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless
> >> >> >> <luc...@mikemccandless.com> wrote:
> >> >> >>>
> >> >> >>> John, looks like this requires login -- any plans to open that
> up,
> >> >> >>> or,
> >> >> >>> post the code on an issue?
> >> >> >>>
> >> >> >>> How self-contained is your Multi PQ sorting?  EG is it a
> standalone
> >> >> >>> Collector impl that I can test?
> >> >> >>>
> >> >> >>> Mike
> >> >> >>>
> >> >> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <john.w...@gmail.com>
> >> >> >>> wrote:
> >> >> >>> > BTW, we are have a little sandbox for these experiments. And
> all
> >> >> >>> > my
> >> >> >>> > testcode
> >> >> >>> > are at. They are not very polished.
> >> >> >>> >
> >> >> >>> > https://lucene-book.googlecode.com/svn/trunk
> >> >> >>> >
> >> >> >>> > -John
> >> >> >>> >
> >> >> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <
> john.w...@gmail.com>
> >> >> >>> > wrote:
> >> >> >>> >>
> >> >> >>> >> Numbers Mike requested for Int types:
> >> >> >>> >>
> >> >> >>> >> only the time/cputime are posted, others are all the same
> since
> >> >> >>> >> the
> >> >> >>> >> algorithm is the same.
> >> >> >>> >>
> >> >> >>> >> Lucene 2.9:
> >> >> >>> >> numhits: 10
> >> >> >>> >> time: 14619495
> >> >> >>> >> cpu: 146126
> >> >> >>> >>
> >> >> >>> >> numhits: 20
> >> >> >>> >> time: 14550568
> >> >> >>> >> cpu: 163242
> >> >> >>> >>
> >> >> >>> >> numhits: 100
> >> >> >>> >> time: 16467647
> >> >> >>> >> cpu: 178379
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> my test:
> >> >> >>> >> numHits: 10
> >> >> >>> >> time: 14101094
> >> >> >>> >> cpu: 144715
> >> >> >>> >>
> >> >> >>> >> numHits: 20
> >> >> >>> >> time: 14804821
> >> >> >>> >> cpu: 151305
> >> >> >>> >>
> >> >> >>> >> numHits: 100
> >> >> >>> >> time: 15372157
> >> >> >>> >> cpu time: 158842
> >> >> >>> >>
> >> >> >>> >> Conclusions:
> >> >> >>> >> The are very similar, the differences are all within error
> >> >> >>> >> bounds,
> >> >> >>> >> especially with lower PQ sizes, which second sort alg again
> >> >> >>> >> slightly
> >> >> >>> >> faster.
> >> >> >>> >>
> >> >> >>> >> Hope this helps.
> >> >> >>> >>
> >> >> >>> >> -John
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley
> >> >> >>> >> <yo...@lucidimagination.com>
> >> >> >>> >> wrote:
> >> >> >>> >>>
> >> >> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless
> >> >> >>> >>> <luc...@mikemccandless.com> wrote:
> >> >> >>> >>> > Though it'd be odd if the switch to searching by segment
> >> >> >>> >>> > really was most of the gains here.
> >> >> >>> >>>
> >> >> >>> >>> I had assumed that much of the improvement was due to
> ditching
> >> >> >>> >>> MultiTermEnum/MultiTermDocs.
> >> >> >>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only
> >> >> >>> >>> helps
> >> >> >>> >>> with queries that use a TermEnum (range, prefix, etc).
> >> >> >>> >>>
> >> >> >>> >>> -Yonik
> >> >> >>> >>> http://www.lucidimagination.com
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> >> >> >>> >>>
> ---------------------------------------------------------------------
> >> >> >>> >>> To unsubscribe, e-mail:
> java-dev-unsubscr...@lucene.apache.org
> >> >> >>> >>> For additional commands, e-mail:
> >> >> >>> >>> java-dev-h...@lucene.apache.org
> >> >> >>> >>>
> >> >> >>> >>
> >> >> >>> >
> >> >> >>> >
> >> >> >>>
> >> >> >>>
> >> >> >>>
> ---------------------------------------------------------------------
> >> >> >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> >> >> >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> >> >> >>>
> >> >> >>
> >> >> >
> >> >> >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> >> >>
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Re: lucene 2.9 sorting algorithm

Reply via email to