Re: forceMerge(1) leads to ~10% perf gains

Ishan Chattopadhyaya Sat, 30 Sep 2023 09:40:42 -0700

Also, try index sorting. Often, there are performance gains to be had with
the right sort key for various query workloads.


On Fri, 22 Sept, 2023, 4:28 pm Adrien Grand, <[email protected]> wrote:

> > Was wondering - are there any other techniques which can be used to speed
> up that work well when forceMerge works like this?
>
> Lucene 9.8 (to be released in a few days hopefully) will add support
> to recursive graph bisection, which is another thing that can be used
> to speed up querying on read-only indices.
>
> https://github.com/apache/lucene/pull/12489
>
> On Fri, Sep 22, 2023 at 12:54 PM Uwe Schindler <[email protected]> wrote:
> >
> > Hi,
> >
> > Yes, a force-merged index can be faster, as less work is spent on
> > looking up terms in different index segments.
> >
> > If you are looking for higher speed, non-merged indexes can actually
> > perform better, IF you parallelize. You can do this by adding an
> > Executor instance to IndexSearcher
> > (<
> https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/search/IndexSearcher.html#%3Cinit%3E(org.apache.lucene.index.IndexReader,java.util.concurrent.Executor)
> >).
> > If you do this each segment of the index is searched in parallel (using
> > the thread pool limits of the Executor) and results are merged at end.
> >
> > If an index is read-only and static, fore-merge is a good idea - unless
> > you want to parallelize.
> >
> > Tokenizing and joining with OR is the correct way, but for speed you may
> > also use AND. To further improve the speed also take a look at Blockmax
> > WAND: If you are not interested in the total number of documents, you
> > can get huge speed improvements. By default this is enabled in Lucene
> > 9.x with default IndexSearcher, but on Solr/Elasticsearch you may need
> > to actively request it. In that case it will only count exact number of
> > hits till 1000 docs are found.
> >
> > Uwe
> >
> > Am 22.09.2023 um 03:40 schrieb qrdl kaggle:
> > > After testing on 4800 fairly complex queries, I see a performance gain
> of
> > > 10% after doing indexWriter.forceMerge(1); indexWriter.commit(); from
> 209
> > > ms per query, to 185 ms per query.
> > >
> > > Queries are quite complex, often about 30 or words, of the format OR
> > > text:<word>
> > >
> > > It went from 214 to 14 files on the forceMerge.
> > >
> > > It's a 6GB static/read only index with about 6.4M documents.
> Documents are
> > > around 1MB or so of text.
> > >
> > > Was wondering - are there any other techniques which can be used to
> speed
> > > up that work well when forceMerge works like this?
> > >
> > > Is there a better way to query and still maintain accuracy than simply
> word
> > > tokenizing a sentence and joining with OR text: ?
> > >
> > --
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de
> > eMail: [email protected]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: forceMerge(1) leads to ~10% perf gains

Reply via email to