Hi,

Yes, a force-merged index can be faster, as less work is spent on looking up terms in different index segments.

If you are looking for higher speed, non-merged indexes can actually perform better, IF you parallelize. You can do this by adding an Executor instance to IndexSearcher (<https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/search/IndexSearcher.html#%3Cinit%3E(org.apache.lucene.index.IndexReader,java.util.concurrent.Executor)>). If you do this each segment of the index is searched in parallel (using the thread pool limits of the Executor) and results are merged at end.

If an index is read-only and static, fore-merge is a good idea - unless you want to parallelize.

Tokenizing and joining with OR is the correct way, but for speed you may also use AND. To further improve the speed also take a look at Blockmax WAND: If you are not interested in the total number of documents, you can get huge speed improvements. By default this is enabled in Lucene 9.x with default IndexSearcher, but on Solr/Elasticsearch you may need to actively request it. In that case it will only count exact number of hits till 1000 docs are found.

Uwe

Am 22.09.2023 um 03:40 schrieb qrdl kaggle:
After testing on 4800 fairly complex queries, I see a performance gain of
10% after doing indexWriter.forceMerge(1); indexWriter.commit(); from 209
ms per query, to 185 ms per query.

Queries are quite complex, often about 30 or words, of the format OR
text:<word>

It went from 214 to 14 files on the forceMerge.

It's a 6GB static/read only index with about 6.4M documents.  Documents are
around 1MB or so of text.

Was wondering - are there any other techniques which can be used to speed
up that work well when forceMerge works like this?

Is there a better way to query and still maintain accuracy than simply word
tokenizing a sentence and joining with OR text: ?

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to