I did not follow the whole thread, but I do not understand what's bad with the new API that rectifies to preserve the old one. The old API does not fit very well with the segment based search and a lot of ugly stuff was done around to make both APIs work the same.
For me it is not very complicated to create a new-style Comparator. The only difference is that you have to implement more methods for the comparison, but if you e.g. take the provided comparators for the basic data types as a base, it is easy to understand how it works and you can modify the examples. And: as far as I know, the old API is not really segment wise, so reopen() cost is much higher and FieldCache gets slower, because the top level reader must be reloaded into cache not the segments. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [email protected] _____ From: Jake Mannix [mailto:[email protected]] Sent: Tuesday, October 20, 2009 8:37 AM To: [email protected] Subject: Re: lucene 2.9 sorting algorithm Given that this new API is pretty unweildy, and seems to not actually perform any better than the old one... are we going to consider revisiting that? -jake On Mon, Oct 19, 2009 at 11:27 PM, Uwe Schindler <[email protected]> wrote: The old search API is already removed in trunk. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [email protected] _____ From: John Wang [mailto:[email protected]] Sent: Tuesday, October 20, 2009 3:28 AM To: [email protected] Subject: Re: lucene 2.9 sorting algorithm Hi Michael: Was wondering if you got a chance to take a look at this. Since deprecated APIs are being removed in 3.0, I was wondering if/when we would decide on keeping the ScoreDocComparator API and thus would be kept for Lucene 3.0. Thanks -John On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless <[email protected]> wrote: Oh, no problem... Mike On Fri, Oct 16, 2009 at 12:33 PM, John Wang <[email protected]> wrote: > Mike, just a clarification on my first perf report email. > The first section, numHits is incorrectly labeled, it should be 20 instead > of 50. Sorry about the possible confusion. > Thanks > -John > > On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless > <[email protected]> wrote: >> >> Thanks John; I'll have a look. >> >> Mike >> >> On Fri, Oct 16, 2009 at 12:57 AM, John Wang <[email protected]> wrote: >> > Hi Michael: >> > I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector >> > as >> > a more general case. I think keeping the old api for ScoreDocComparator >> > and >> > SortComparatorSource would work. >> > Please take a look. >> > Thanks >> > -John >> > >> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang <[email protected]> wrote: >> >> >> >> Hi Michael: >> >> It is open, http://code.google.com/p/lucene-book/source/checkout >> >> I think I sent the https url instead, sorry. >> >> The multi PQ sorting is fairly self-contained, I have 2 versions, 1 >> >> for string and 1 for int, each are Collector impls. >> >> I shouldn't say the Multi Q is faster on int sort, it is within >> >> the >> >> error boundary. The diff is very very small, I would stay they are more >> >> equal. >> >> If you think it is a good thing to go this way, (if not for the >> >> perf, >> >> just for the simpler api) I'd be happy to work on a patch. >> >> Thanks >> >> -John >> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless >> >> <[email protected]> wrote: >> >>> >> >>> John, looks like this requires login -- any plans to open that up, or, >> >>> post the code on an issue? >> >>> >> >>> How self-contained is your Multi PQ sorting? EG is it a standalone >> >>> Collector impl that I can test? >> >>> >> >>> Mike >> >>> >> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <[email protected]> >> >>> wrote: >> >>> > BTW, we are have a little sandbox for these experiments. And all my >> >>> > testcode >> >>> > are at. They are not very polished. >> >>> > >> >>> > https://lucene-book.googlecode.com/svn/trunk >> >>> > >> >>> > -John >> >>> > >> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <[email protected]> >> >>> > wrote: >> >>> >> >> >>> >> Numbers Mike requested for Int types: >> >>> >> >> >>> >> only the time/cputime are posted, others are all the same since the >> >>> >> algorithm is the same. >> >>> >> >> >>> >> Lucene 2.9: >> >>> >> numhits: 10 >> >>> >> time: 14619495 >> >>> >> cpu: 146126 >> >>> >> >> >>> >> numhits: 20 >> >>> >> time: 14550568 >> >>> >> cpu: 163242 >> >>> >> >> >>> >> numhits: 100 >> >>> >> time: 16467647 >> >>> >> cpu: 178379 >> >>> >> >> >>> >> >> >>> >> my test: >> >>> >> numHits: 10 >> >>> >> time: 14101094 >> >>> >> cpu: 144715 >> >>> >> >> >>> >> numHits: 20 >> >>> >> time: 14804821 >> >>> >> cpu: 151305 >> >>> >> >> >>> >> numHits: 100 >> >>> >> time: 15372157 >> >>> >> cpu time: 158842 >> >>> >> >> >>> >> Conclusions: >> >>> >> The are very similar, the differences are all within error bounds, >> >>> >> especially with lower PQ sizes, which second sort alg again >> >>> >> slightly >> >>> >> faster. >> >>> >> >> >>> >> Hope this helps. >> >>> >> >> >>> >> -John >> >>> >> >> >>> >> >> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley >> >>> >> <[email protected]> >> >>> >> wrote: >> >>> >>> >> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless >> >>> >>> <[email protected]> wrote: >> >>> >>> > Though it'd be odd if the switch to searching by segment >> >>> >>> > really was most of the gains here. >> >>> >>> >> >>> >>> I had assumed that much of the improvement was due to ditching >> >>> >>> MultiTermEnum/MultiTermDocs. >> >>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only >> >>> >>> helps >> >>> >>> with queries that use a TermEnum (range, prefix, etc). >> >>> >>> >> >>> >>> -Yonik >> >>> >>> http://www.lucidimagination.com >> >>> >>> >> >>> >>> >> >>> >>> --------------------------------------------------------------------- >> >>> >>> To unsubscribe, e-mail: [email protected] >> >>> >>> For additional commands, e-mail: [email protected] >> >>> >>> >> >>> >> >> >>> > >> >>> > >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: [email protected] >> >>> For additional commands, e-mail: [email protected] >> >>> >> >> >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
