> Was wondering - are there any other techniques which can be used to speed
up that work well when forceMerge works like this?

Lucene 9.8 (to be released in a few days hopefully) will add support
to recursive graph bisection, which is another thing that can be used
to speed up querying on read-only indices.

https://github.com/apache/lucene/pull/12489

On Fri, Sep 22, 2023 at 12:54 PM Uwe Schindler <u...@thetaphi.de> wrote:
>
> Hi,
>
> Yes, a force-merged index can be faster, as less work is spent on
> looking up terms in different index segments.
>
> If you are looking for higher speed, non-merged indexes can actually
> perform better, IF you parallelize. You can do this by adding an
> Executor instance to IndexSearcher
> (<https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/search/IndexSearcher.html#%3Cinit%3E(org.apache.lucene.index.IndexReader,java.util.concurrent.Executor)>).
> If you do this each segment of the index is searched in parallel (using
> the thread pool limits of the Executor) and results are merged at end.
>
> If an index is read-only and static, fore-merge is a good idea - unless
> you want to parallelize.
>
> Tokenizing and joining with OR is the correct way, but for speed you may
> also use AND. To further improve the speed also take a look at Blockmax
> WAND: If you are not interested in the total number of documents, you
> can get huge speed improvements. By default this is enabled in Lucene
> 9.x with default IndexSearcher, but on Solr/Elasticsearch you may need
> to actively request it. In that case it will only count exact number of
> hits till 1000 docs are found.
>
> Uwe
>
> Am 22.09.2023 um 03:40 schrieb qrdl kaggle:
> > After testing on 4800 fairly complex queries, I see a performance gain of
> > 10% after doing indexWriter.forceMerge(1); indexWriter.commit(); from 209
> > ms per query, to 185 ms per query.
> >
> > Queries are quite complex, often about 30 or words, of the format OR
> > text:<word>
> >
> > It went from 214 to 14 files on the forceMerge.
> >
> > It's a 6GB static/read only index with about 6.4M documents.  Documents are
> > around 1MB or so of text.
> >
> > Was wondering - are there any other techniques which can be used to speed
> > up that work well when forceMerge works like this?
> >
> > Is there a better way to query and still maintain accuracy than simply word
> > tokenizing a sentence and joining with OR text: ?
> >
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>


-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to