Re: lucene 2.9 sorting algorithm

Earwin Burrfoot Tue, 20 Oct 2009 15:29:30 -0700

That's quite possible to reimplement, I believe. You can have your
docid->ordinal map bound to toplevel reader, as it was before and then
your FIeldComparator rebases incoming compare() docids based on what
last setNextReader() was called with.


On Wed, Oct 21, 2009 at 02:07, TomS <tomshall...@googlemail.com> wrote:
> Hi,
>
> I can confirm the below mentioned problems trying to migrate to 2.9.
>
> Our Lucene-based (2.4) app uses custom multi-level sorting on a lot of
> different fields and pretty large indexes (> 100m docs). Most of the fields
> that we sort on are strings, some with up to 400 characters in length. A lot
> of the strings share a common prefix, so comparing these
> character-by-character is costly. As the comparison proved to be a hotspot
> (and ordinary string field caches a memory-hog) we've been using a similar
> approach like the one mentioned by John Wang for years now, starting with
> Lucene 1.x: Using a map from doc id to ordinals (int->int) where the ordinal
> represents the order imposed by the (costly) comparison.
>
> This has been rather simple to implement and gives our app sizeable
> performance gains. Furthermore, it saves a lot of memory compared to
> standard field caches.
>
> With the new API, though, the simple mapping from doc id to pre-generated
> sort ordinals won't work anymore. This is the only reason that has prevented
> us from upgrading to 2.9. Currently it seems to us that adapting to the new
> sorting API would be either much more complicated to implement, or - when
> reverting to standard field cache-based sorting - slower and use far more
> memory. On the other hand, we might not have a full understanding of 2.9's
> inner workings yet, being used to (hopefully not stuck with) our a.m. custom
> solution.
>
> Other than that, 2.9 seems to be a very nice release. Thanks for the great
> work!
>
> -Tom
>
>
> On Tue, Oct 20, 2009 at 8:55 AM, John Wang <john.w...@gmail.com> wrote:
> [...]
> Let me provide an example:
>     We have a multi valued field on integers, we define a sort on this set
> of strings by defining a comparator on each value to be similar to a lex
> order, instead of compare on characters, we do on strings, we also want to
> keep the multi value representation as we do filtering and facet counting on
> it. The in memory representation is similar to the UnInvertedField in Solr.
>    Implementing a sort with the old API was rather simple, as we only needed
> mapping from a docid to a set of ordinals. With the new api, we needed to do
> a "conversion", which would mean mapping a set of String/ordinals back to a
> doc. Which is to me, is not trivial, let alone performance implications.
>    That actually gave us to motivation to see if the old api can handle the
> segment level changes that was made in 2.9 (which in my opinion is the best
> thing in lucene since payloads :) )
>    So after some investigation, with code and big O analysis, and
> discussions with Mike and Yonik, on our end, we feel given the performance
> numbers, it is unnecessary to go with the more complicated API.
> Thanks
> -John
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: lucene 2.9 sorting algorithm

Reply via email to