> Was wondering - are there any other techniques which can be used to speed up that work well when forceMerge works like this?
Lucene 9.8 (to be released in a few days hopefully) will add support to recursive graph bisection, which is another thing that can be used to speed up querying on read-only indices. https://github.com/apache/lucene/pull/12489 On Fri, Sep 22, 2023 at 12:54 PM Uwe Schindler <u...@thetaphi.de> wrote: > > Hi, > > Yes, a force-merged index can be faster, as less work is spent on > looking up terms in different index segments. > > If you are looking for higher speed, non-merged indexes can actually > perform better, IF you parallelize. You can do this by adding an > Executor instance to IndexSearcher > (<https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/search/IndexSearcher.html#%3Cinit%3E(org.apache.lucene.index.IndexReader,java.util.concurrent.Executor)>). > If you do this each segment of the index is searched in parallel (using > the thread pool limits of the Executor) and results are merged at end. > > If an index is read-only and static, fore-merge is a good idea - unless > you want to parallelize. > > Tokenizing and joining with OR is the correct way, but for speed you may > also use AND. To further improve the speed also take a look at Blockmax > WAND: If you are not interested in the total number of documents, you > can get huge speed improvements. By default this is enabled in Lucene > 9.x with default IndexSearcher, but on Solr/Elasticsearch you may need > to actively request it. In that case it will only count exact number of > hits till 1000 docs are found. > > Uwe > > Am 22.09.2023 um 03:40 schrieb qrdl kaggle: > > After testing on 4800 fairly complex queries, I see a performance gain of > > 10% after doing indexWriter.forceMerge(1); indexWriter.commit(); from 209 > > ms per query, to 185 ms per query. > > > > Queries are quite complex, often about 30 or words, of the format OR > > text:<word> > > > > It went from 214 to 14 files on the forceMerge. > > > > It's a 6GB static/read only index with about 6.4M documents. Documents are > > around 1MB or so of text. > > > > Was wondering - are there any other techniques which can be used to speed > > up that work well when forceMerge works like this? > > > > Is there a better way to query and still maintain accuracy than simply word > > tokenizing a sentence and joining with OR text: ? > > > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: u...@thetaphi.de > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org