I noticed that commit() was taking an inordinately long time. It turned out IndexWriter was flushing using only a single thread because it relies on its caller to supply it with threads (via updateDocument, deleteDocument, etc), which it then "hijacks" to do flushing. If (as we do) a caller indexes a lot of documents and then calls commit at the end of a large batch, when no indexing is ongoing, the commit() takes much longer than needed since it is unable to make user of multiple cores to do concurrent I/O.
How can we support this batch-mode use case better? I think we should - it's not an unreasonable thing to do, since it can lead to the shortest overall indexing time if you have sufficient RAM and don't need to search until you're done indexing. I tried adding an IndexWriter.yield() method that just flushes pending segments and does other queued work; the caller can invoke this in order to provide resources. A more convenient API would be to grant IndexWriter an ExecutorService of its own, but this is more involved since it would ne necessary to arbitrate where the work should be done. Maybe a middle ground would be to offer a commit(ExecutorService) method. Any other ideas? Any interest in a patch for IndexWriter.yield()? -Mike