Michael McCandless created LUCENE-8962:
------------------------------------------
Summary: Can we merge small segments during refresh, for faster
searching?
Key: LUCENE-8962
URL: https://issues.apache.org/jira/browse/LUCENE-8962
Project: Lucene - Core
Issue Type: Improvement
Components: core/index
Reporter: Michael McCandless
With near-real-time search we ask {{IndexWriter}} to write all in-memory
segments to disk and open an {{IndexReader}} to search them, and this is
typically a quick operation.
However, when you use many threads for concurrent indexing, {{IndexWriter}}
will accumulate write many small segments during {{refresh}} and this then adds
search-time cost as searching must visit all of these tiny segments.
The merge policy would normally quickly coalesce these small segments if given
a little time ... so, could we somehow improve {{IndexWriter'}}s refresh to
optionally kick off merge policy to merge segments below some threshold before
opening the near-real-time reader? It'd be a bit tricky because while we are
waiting for merges, indexing may continue, and new segments may be flushed, but
those new segments shouldn't be included in the point-in-time segments returned
by refresh ...
One could almost do this on top of Lucene today, with a custom merge policy,
and some hackity logic to have the merge policy target small segments just
written by refresh, but it's tricky to then open a near-real-time reader,
excluding newly flushed but including newly merged segments since the refresh
originally finished ...
I'm not yet sure how best to solve this, so I wanted to open an issue for
discussion!
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]