Re: Custom Sorting

Vitaly Funstein Wed, 25 Jun 2014 17:39:06 -0700

As a compromise, you can base your custom sort function on values of stored
fields in the same index - as opposed to fetching them from an external
data store, or relying on internal sorting implementation in Lucene. It
will still be relatively slow, but not nearly as slow as going out to a
DB... though you can also do some smart lazy caching (and selective
removal) of sort field values, as you go along with the sorting.


If I understand this correctly, FieldCache slurps in the value for every
sort field, for each document in the index up front, and holds on to them
at least for the duration of the search (or until the reader is closed
which may actually be even later) ... although there's probably more to it
than I am describing which I'll leave up to the experts to elaborate on.


On Wed, Jun 25, 2014 at 5:10 PM, Erick Erickson <[email protected]>
wrote:

> Sure, you can  write a custom function, see:
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
> And you can invoke your custom function since sorting by function is
> supported.
>
> But my point remains. To be performant, you'll have to cache the
> results. Which is what's happening already.
> If you do something clever that tries to purge old values that you're
> sorting by, then you'll probably run into
> performance issues. At least that's my guess.
>
> I think this will be a dead-end for you, but would love to be proved
> wrong about that....
>
> Best
> Erick
>
> On Wed, Jun 25, 2014 at 4:34 AM, Sandeep Khanzode
> <[email protected]> wrote:
> > Hi,
> >
> > Thanks for your reply.
> > Actually, I am evaluating both approaches.
> >
> > With the sort being performed on a field indexed in Lucene itself, my
> concern is with the FieldCache. Very quickly, for multiple clients
> executing in parallel, it bumps up to 8-10GB. This is for 4-5 different
> Sort fields using an index corpus of 50M documents. The problem is not so
> much the memory consumption, as mush as controlling it. If the max heap
> argument for the JVM is scaled back to 2-3GB, then all clients throw an
> OOM. How should the FieldCache scale based on the amount of available max
> memory to the JVM or can it be selectively turned off, or implement a LRU
> type of algorithm to purge old entries?
> >
> > Secondly, the the DB approach, yes, it will not perform. However, I just
> wanted to know whether such a custom sort function exists that allows one
> to write their own sort on a field that is not indexed by Lucene.
> >
> > Thanks again,
> >
> > -----------------------
> > Thanks n Regards,
> > Sandeep Ramesh Khanzode
> >
> >
> > On Wednesday, June 25, 2014 1:21 AM, Erick Erickson <
> [email protected]> wrote:
> >
> >
> >
> > I'm a little confused here. Sure, sorting on a number of fields will
> > increase memory, the basic idea here is that you need to cache all the
> > sort values (plus support structures) for performance reasons.
> >
> > If you create your own custom sort that goes out to a DB and gets the
> > doc, you have to be prepared for
> > q=*:*&sort=custom_function
> > Which means you'll have to fetch the value for each and every document
> > in the index. If this is a DB call, it will NOT perform.
> >
> > In order to be performant, you'll need to cache the values. Which is
> > what is being done _for_ you by the FieldCache.
> >
> > So I think this is really a false path, or an "XY" problem. Why do you
> > think you need to do this?
> >
> > Best,
> > Erick
> >
> >
> > On Tue, Jun 24, 2014 at 10:31 AM, Sandeep Khanzode
> > <[email protected]> wrote:
> >> Hi,
> >>
> >> I am trying to implement a sort order for search results in Lucene
> 4.7.2.
> >>
> >> If I want to use data for ordering that is not stored in Lucene as
> Fields, is there any way this can be done?
> >> Basically, I would have certain data that is associated logically to a
> document but stored elsewhere, like a DB. Can I create a Custom Sort
> function on the lines of a FieldComparator to sort based on this data by
> plugging it inside the sort function?
> >>
> >> I have tested the performance of the Sort function for String and
> numeric types, and as mentioned in some blog, it seems that the numeric
> type is much faster compared to the string type. However, if I sort on a
> number of fields from multiple clients, the memory footprint, due to the
> FieldCache.DEFAULT impl, increases approximately 5-6 times. If I run this
> on a machine which does not have this capacity, will I get a OOM or will
> there be intense thrashing for the memory?
> >>
> >>
> >> -----------------------
> >> Thanks n Regards,
> >> Sandeep Ramesh Khanzode
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Custom Sorting

Reply via email to