Hi,
Yes, a force-merged index can be faster, as less work is spent on
looking up terms in different index segments.
If you are looking for higher speed, non-merged indexes can actually
perform better, IF you parallelize. You can do this by adding an
Executor instance to IndexSearcher
(<https://lucene.apache.org/core/9_7_0/core/org/apache/lucene/search/IndexSearcher.html#%3Cinit%3E(org.apache.lucene.index.IndexReader,java.util.concurrent.Executor)>).
If you do this each segment of the index is searched in parallel (using
the thread pool limits of the Executor) and results are merged at end.
If an index is read-only and static, fore-merge is a good idea - unless
you want to parallelize.
Tokenizing and joining with OR is the correct way, but for speed you may
also use AND. To further improve the speed also take a look at Blockmax
WAND: If you are not interested in the total number of documents, you
can get huge speed improvements. By default this is enabled in Lucene
9.x with default IndexSearcher, but on Solr/Elasticsearch you may need
to actively request it. In that case it will only count exact number of
hits till 1000 docs are found.
Uwe
Am 22.09.2023 um 03:40 schrieb qrdl kaggle:
After testing on 4800 fairly complex queries, I see a performance gain of
10% after doing indexWriter.forceMerge(1); indexWriter.commit(); from 209
ms per query, to 185 ms per query.
Queries are quite complex, often about 30 or words, of the format OR
text:<word>
It went from 214 to 14 files on the forceMerge.
It's a 6GB static/read only index with about 6.4M documents. Documents are
around 1MB or so of text.
Was wondering - are there any other techniques which can be used to speed
up that work well when forceMerge works like this?
Is there a better way to query and still maintain accuracy than simply word
tokenizing a sentence and joining with OR text: ?
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org