+1 to make it simple to let multiple threads help with commit/refresh
operations.

IW.yield is a simple way to achieve it, matching (roughly) how IW's
commit/refresh work today, hijacking incoming indexing threads to gain
concurrency.  I think this would be a small change?

Adding an ExecutorService to e.g. IndexWriterConfig, so all ops (commit,
refresh, eventually also merging which today still spawns its own threads)
could be concurrent when possible would be a nice longer term solution but
I suspect that's a much more invasive change than the simple IW.yield.

Progress not perfection :)

Mike McCandless

http://blog.mikemccandless.com


On Fri, Feb 15, 2019 at 4:11 PM Michael Sokolov <msoko...@gmail.com> wrote:

> I noticed that commit() was taking an inordinately long time. It turned out
> IndexWriter was flushing using only a single thread because it relies on
> its caller to supply it with threads (via updateDocument, deleteDocument,
> etc), which it then "hijacks" to do flushing. If (as we do) a caller
> indexes a lot of documents and then calls commit at the end of a large
> batch, when no indexing is ongoing, the commit() takes much longer than
> needed since it is unable to make user of multiple cores to do concurrent
> I/O.
>
> How can we support this batch-mode use case better? I think we should -
> it's not an unreasonable thing to do, since it can lead to the shortest
> overall indexing time if you have sufficient RAM and don't need to search
> until you're done indexing. I tried adding an IndexWriter.yield() method
> that just flushes pending segments and does other queued work; the caller
> can invoke this in order to provide resources. A more convenient API would
> be to grant IndexWriter an ExecutorService of its own, but this is more
> involved since it would ne necessary to arbitrate where the work should be
> done. Maybe a middle ground would be to offer a commit(ExecutorService)
> method. Any other ideas? Any interest in a patch for IndexWriter.yield()?
>
> -Mike
>

Reply via email to