RE: lucene 2.9 sorting algorithm

Uwe Schindler Tue, 20 Oct 2009 01:22:59 -0700

I did not follow the whole thread, but I do not understand what's bad with
the new API that rectifies to preserve the old one. The old API does not fit
very well with the segment based search and a lot of ugly stuff was done
around to make both APIs work the same.


 

For me it is not very complicated to create a new-style Comparator. The only
difference is that you have to implement more methods for the comparison,
but if you e.g. take the provided comparators for the basic data types as a
base, it is easy to understand how it works and you can modify the examples.

 

And: as far as I know, the old API is not really segment wise, so reopen()
cost is much higher and FieldCache gets slower, because the top level reader
must be reloaded into cache not the segments.

 

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

  _____  

From: Jake Mannix [mailto:[email protected]] 
Sent: Tuesday, October 20, 2009 8:37 AM
To: [email protected]
Subject: Re: lucene 2.9 sorting algorithm

 

Given that this new API is pretty unweildy, and seems to not actually
perform any better than the old one... are we going to consider revisiting
that?

  -jake

On Mon, Oct 19, 2009 at 11:27 PM, Uwe Schindler <[email protected]> wrote:

The old search API is already removed in trunk.

 

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

  _____  

From: John Wang [mailto:[email protected]] 
Sent: Tuesday, October 20, 2009 3:28 AM
To: [email protected]
Subject: Re: lucene 2.9 sorting algorithm

 

Hi Michael:

 

     Was wondering if you got a chance to take a look at this.

 

     Since deprecated APIs are being removed in 3.0, I was wondering if/when
we would decide on keeping the ScoreDocComparator API and thus would be kept
for Lucene 3.0.

 

Thanks

 

-John

On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless
<[email protected]> wrote:

Oh, no problem...

Mike


On Fri, Oct 16, 2009 at 12:33 PM, John Wang <[email protected]> wrote:
> Mike, just a clarification on my first perf report email.
> The first section, numHits is incorrectly labeled, it should be 20 instead
> of 50. Sorry about the possible confusion.
> Thanks
> -John
>
> On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless
> <[email protected]> wrote:
>>
>> Thanks John; I'll have a look.
>>
>> Mike
>>
>> On Fri, Oct 16, 2009 at 12:57 AM, John Wang <[email protected]> wrote:
>> > Hi Michael:
>> >     I added classes: ScoreDocComparatorQueue and
OneSortNoScoreCollector
>> > as
>> > a more general case. I think keeping the old api for ScoreDocComparator
>> > and
>> > SortComparatorSource would work.
>> >   Please take a look.
>> > Thanks
>> > -John
>> >
>> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang <[email protected]> wrote:
>> >>
>> >> Hi Michael:
>> >>      It is open, http://code.google.com/p/lucene-book/source/checkout
>> >>      I think I sent the https url instead, sorry.
>> >>     The multi PQ sorting is fairly self-contained, I have 2 versions,
1
>> >> for string and 1 for int, each are Collector impls.
>> >>      I shouldn't say the Multi Q is faster on int sort, it is within
>> >> the
>> >> error boundary. The diff is very very small, I would stay they are
more
>> >> equal.
>> >>      If you think it is a good thing to go this way, (if not for the
>> >> perf,
>> >> just for the simpler api) I'd be happy to work on a patch.
>> >> Thanks
>> >> -John
>> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless
>> >> <[email protected]> wrote:
>> >>>
>> >>> John, looks like this requires login -- any plans to open that up,
or,
>> >>> post the code on an issue?
>> >>>
>> >>> How self-contained is your Multi PQ sorting?  EG is it a standalone
>> >>> Collector impl that I can test?
>> >>>
>> >>> Mike
>> >>>
>> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <[email protected]>
>> >>> wrote:
>> >>> > BTW, we are have a little sandbox for these experiments. And all my
>> >>> > testcode
>> >>> > are at. They are not very polished.
>> >>> >
>> >>> > https://lucene-book.googlecode.com/svn/trunk
>> >>> >
>> >>> > -John
>> >>> >
>> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <[email protected]>
>> >>> > wrote:
>> >>> >>
>> >>> >> Numbers Mike requested for Int types:
>> >>> >>
>> >>> >> only the time/cputime are posted, others are all the same since
the
>> >>> >> algorithm is the same.
>> >>> >>
>> >>> >> Lucene 2.9:
>> >>> >> numhits: 10
>> >>> >> time: 14619495
>> >>> >> cpu: 146126
>> >>> >>
>> >>> >> numhits: 20
>> >>> >> time: 14550568
>> >>> >> cpu: 163242
>> >>> >>
>> >>> >> numhits: 100
>> >>> >> time: 16467647
>> >>> >> cpu: 178379
>> >>> >>
>> >>> >>
>> >>> >> my test:
>> >>> >> numHits: 10
>> >>> >> time: 14101094
>> >>> >> cpu: 144715
>> >>> >>
>> >>> >> numHits: 20
>> >>> >> time: 14804821
>> >>> >> cpu: 151305
>> >>> >>
>> >>> >> numHits: 100
>> >>> >> time: 15372157
>> >>> >> cpu time: 158842
>> >>> >>
>> >>> >> Conclusions:
>> >>> >> The are very similar, the differences are all within error bounds,
>> >>> >> especially with lower PQ sizes, which second sort alg again
>> >>> >> slightly
>> >>> >> faster.
>> >>> >>
>> >>> >> Hope this helps.
>> >>> >>
>> >>> >> -John
>> >>> >>
>> >>> >>
>> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley
>> >>> >> <[email protected]>
>> >>> >> wrote:
>> >>> >>>
>> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless
>> >>> >>> <[email protected]> wrote:
>> >>> >>> > Though it'd be odd if the switch to searching by segment
>> >>> >>> > really was most of the gains here.
>> >>> >>>
>> >>> >>> I had assumed that much of the improvement was due to ditching
>> >>> >>> MultiTermEnum/MultiTermDocs.
>> >>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only
>> >>> >>> helps
>> >>> >>> with queries that use a TermEnum (range, prefix, etc).
>> >>> >>>
>> >>> >>> -Yonik
>> >>> >>> http://www.lucidimagination.com
>> >>> >>>
>> >>> >>>
>> >>> >>>
---------------------------------------------------------------------
>> >>> >>> To unsubscribe, e-mail: [email protected]
>> >>> >>> For additional commands, e-mail: [email protected]
>> >>> >>>
>> >>> >>
>> >>> >
>> >>> >
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: [email protected]
>> >>> For additional commands, e-mail: [email protected]
>> >>>
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: lucene 2.9 sorting algorithm

Reply via email to