Re: Query performance in Lucene 4.x

Desidero Wed, 02 Oct 2013 07:06:43 -0700

I extended the IndexSearcher last night and set it up so it would make one
task per IndexReader instead of one per AtomicReaderContext. Performance
was pretty bad just like before, so it looks like I'm stuck merging
everything into one big segment.


I went through the documentation for the various merge policies and tried a
few different configurations, but couldn't find one that naturally caps the
number of segments at 1. The most promising options either had undocumented
limits in their setters or they didn't behave quite like I expected. I'll
spend some more time playing with it tonight, but in the meantime I don't
suppose anyone else knows a way to accomplish what I'm trying to do without
using forceMerge(1)?


On Tue, Oct 1, 2013 at 6:10 PM, Desidero <[email protected]> wrote:

> Uwe,
>
> I was using a bounded thread pool.
>
> I don't know if the problem was the task overload or something about the
> actual efficiency of searching a single segment rather than iterating over
> multiple AtomicReaderContexts, but I'd lean toward task overload. I will do
> some testing tonight to find out for sure.
>
> Matt
>  Hi,
>
> use a bounded thread pool.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [email protected]
>
>
> > -----Original Message-----
> > From: Desidero [mailto:[email protected]]
> > Sent: Tuesday, October 01, 2013 11:37 PM
> > To: [email protected]
> > Subject: Re: Query performance in Lucene 4.x
> >
> > For anyone who was wondering, this was actually resolved in a different
> > thread today. I misread the information in the
> > IndexSearcher(IndexReader,ExecutorService) constructor documentation - I
> > was under the impression that it was submitting a thread for each index
> > shard (MultiReader wraps 20 shards, so 20 tasks) but it was really
> submitting
> > a task for each segment within each shard (20 shards * ~10 segments =
> ~200
> > tasks) which is horrible. Since my index changes infrequently, I'm using
> > forceMerge(1) before sending out updated indexes to the slave servers.
> > Without any extra tuning (threads, # of shards, etc) I've gone from ~2900
> > requests per minute to ~10k requests per minute.
> >
> > Thanks to Adrien and Mike for the clarification and Benson for bringing
> up
> > the question that led to my answer.
> >
> > I'm still pretty new to Lucene so I have a lot of poking around to do,
> but I'm
> > going to try to implement the "virtual segment" concept that Mike
> > mentioned. It'll be really helpful for those of us who want parallelism
> within
> > queries and don't want to forceMerge.
> >
> >
> > On Fri, Sep 27, 2013 at 9:55 AM, Desidero <[email protected]> wrote:
> >
> > > Erick,
> > >
> > > Thank you for responding.
> > >
> > > I ran tests using both compressed fields and uncompressed fields, and
> > > it was significantly slower with uncompressed fields. I looked into
> > > the lazy field loading per your suggestion, but we don't get any
> > > values from the returned Documents until the result set has been
> > appropriately reduced.
> > > Since we only store one retrievable field and we always need to get
> > > it, it doesn't save any time loading it lazily.
> > >
> > > I'll try running a test without loading any fields just to see how it
> > > affects performance and let you know how that goes.
> > >
> > > Regards,
> > > Matt
> > >
> > >
> > > On Fri, Sep 27, 2013 at 8:01 AM, Erick Erickson
> > <[email protected]>wrote:
> > >
> > >> Hmmm, since 4.1, fields have been stored compressed by default.
> > >> I suppose it's possible that this is a result of
> > >> compressing/uncompressing.
> > >>
> > >> What happens if
> > >> 1> you enable lazy field loading
> > >> 2> don't load any fields?
> > >>
> > >> FWIW,
> > >> Erick
> > >>
> > >> On Thu, Sep 26, 2013 at 10:55 AM, Desidero <[email protected]>
> > wrote:
> > >> > A quick update:
> > >> >
> > >> > In order to confirm that none of the standard migration changes had
> > >> > a negative effect on performance, I ported my Lucene 4.x version
> > >> > back to Lucene 3.6.2 and kept the newer API rather than using the
> > >> > custom ParallelMultiSearcher and other deprecated methods/classes.
> > >> >
> > >> > Performance in 3.6.2 is even faster than before (~2900 requests/min
> > >> with 4.x
> > >> > vs ~6200 requests/min with 3.6.2), so none of my code changes
> > >> > should be causing the difference. It seems to be something Lucene
> > >> > is doing under
> > >> the
> > >> > covers.
> > >> >
> > >> > Again, if there's any other information if I can provide to help
> > >> determine
> > >> > what's going on, please let me know.
> > >> >
> > >> > Thanks,
> > >> > Matt
> > >> >
> > >> >
> > >> >
> > >> > -------------------------------------------------------------------
> > >> > -- To unsubscribe, e-mail: [email protected]
> > >> > For additional commands, e-mail: [email protected]
> > >> >
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [email protected]
> > >> For additional commands, e-mail: [email protected]
> > >>
> > >>
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Query performance in Lucene 4.x

Reply via email to