Also, try index sorting. Often, there are performance gains to be had with the right sort key for various query workloads.
On Fri, 22 Sept, 2023, 4:28 pm Adrien Grand, <jpou...@gmail.com> wrote: > > Was wondering - are there any other techniques which can be used to speed > up that work well when forceMerge works like this? > > Lucene 9.8 (to be released in a few days hopefully) will add support > to recursive graph bisection, which is another thing that can be used > to speed up querying on read-only indices. > > https://github.com/apache/lucene/pull/12489 > > On Fri, Sep 22, 2023 at 12:54 PM Uwe Schindler <u...@thetaphi.de> wrote: > > > > Hi, > > > > Yes, a force-merged index can be faster, as less work is spent on > > looking up terms in different index segments. > > > > If you are looking for higher speed, non-merged indexes can actually > > perform better, IF you parallelize. You can do this by adding an > > Executor instance to IndexSearcher > > (< > https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/search/IndexSearcher.html#%3Cinit%3E(org.apache.lucene.index.IndexReader,java.util.concurrent.Executor) > >). > > If you do this each segment of the index is searched in parallel (using > > the thread pool limits of the Executor) and results are merged at end. > > > > If an index is read-only and static, fore-merge is a good idea - unless > > you want to parallelize. > > > > Tokenizing and joining with OR is the correct way, but for speed you may > > also use AND. To further improve the speed also take a look at Blockmax > > WAND: If you are not interested in the total number of documents, you > > can get huge speed improvements. By default this is enabled in Lucene > > 9.x with default IndexSearcher, but on Solr/Elasticsearch you may need > > to actively request it. In that case it will only count exact number of > > hits till 1000 docs are found. > > > > Uwe > > > > Am 22.09.2023 um 03:40 schrieb qrdl kaggle: > > > After testing on 4800 fairly complex queries, I see a performance gain > of > > > 10% after doing indexWriter.forceMerge(1); indexWriter.commit(); from > 209 > > > ms per query, to 185 ms per query. > > > > > > Queries are quite complex, often about 30 or words, of the format OR > > > text:<word> > > > > > > It went from 214 to 14 files on the forceMerge. > > > > > > It's a 6GB static/read only index with about 6.4M documents. > Documents are > > > around 1MB or so of text. > > > > > > Was wondering - are there any other techniques which can be used to > speed > > > up that work well when forceMerge works like this? > > > > > > Is there a better way to query and still maintain accuracy than simply > word > > > tokenizing a sentence and joining with OR text: ? > > > > > -- > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >