Re: lucene 2.9 sorting algorithm

John Wang Wed, 21 Oct 2009 23:17:43 -0700

Hi Mike:
     I have been playing with the patch, and I think I have some information
that you might like.


     Let me spend sometime and gather some more numbers and update in jira.

Thanks

btw:

     About the conversion on multi values fields, I am not sure I get it
(sorry for being ignorant):

     say bottom has ords 23, 45, 76, each corresponding to a string. When
moving to the next segment, you need to make bottom to have ords that can be
comparable to other docs in this new segment, so you would need to find the
new ords for the values in 23,45 and 76, don't you? To find it, assuming the
values are s1,s2,s3, you would do a bin. search on the new val array, and
find index for s1,s2,s3. Which is 3 bin searches per convert, I am not sure
how you can short circuit it. Are you suggesting we call Comparable on
compareBottom until some doc beats it? That would hurt performance I lot
though, no?

-John

On Wed, Oct 21, 2009 at 3:11 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Tue, Oct 20, 2009 at 11:55 AM, John Wang <john.w...@gmail.com> wrote:
>
> > the simpler api places less restriction on the type of custom
> > sorting that can be done.
>
> Just to verify: this is not a back-compat break, right?
>
> Because, in 2.4, such an interesting custom sort must've been
> operating at the top-level index reader level, which is easy to carry
> over to 2.9 (you just rebase the docIDs).
>
> But, of course in moving to 2.9, you would like to also switch your
> custom sort to be per-segment (for faster reopen/near real-time perf),
> but the new sort API makes this more difficult because it requires
> that you are able to compare hits across different segments during the
> search, not just at the end.
>
> But then I don't understand the difficulty of doing that: if we had a
> Collector with the MultiPQ approach, at the end during merge, you'd
> also have to compare results across segments, ie, upgrade your ords to
> their real values.  The MultiPQ approach does this by calling
> sortValue (returns Comparable) in the end.
>
> Putting performance aside for now... when comparing bottom, you don't
> actually have to "truly invert" Comparable -> ord on segment
> transition.  You could, instead, get the Comparable for each and
> compare, but then note the smallest ord for the current segment that
> has failed to compete, and short-ciruit the compareBottom test by
> checking against that ord. That should enable carrying over the custom
> sort to the single PQ API without needing invert ord->value.
>
> We'd obviously have to test performance...
>
> Or, we could commit the MultiPQ approach as another sorting collector?
> I know it's not great having two wildly differenet sort APIs, but both
> APIs seem to have their strengths in different cases.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Re: lucene 2.9 sorting algorithm

Reply via email to