[ https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778335#action_12778335 ]
Michael McCandless commented on LUCENE-2047: -------------------------------------------- Thinking more on this... I'm actually no longer convinced that this change is worthwhile. Net/net this will not improve the dps/qps throughput on a given fixed hardware, because this is a zero sum game: the deletes must be resolved one way or another. Whether we do it in batch (as today), or incrementally/concurrently, one at a time as they arrive, the same work must be done. In fact, batch should be less costly in practice since it clearly has temporal locality in resolving terms -> postings, so on a machine whose IO cache can't hold the entire index in RAM, bulk flushing should be a win. It's true that net latency of reopen will be reduced by being incremental, but Lucene shouldn't aim to be able to reopen 100s of times per second: I think that's a mis-feature (most apps don't need it), and those that really do can and should use an approach like Zoie. Finally, one can always set the max buffered delete terms/docs to something low, to achieve this same tradeoff. It's true that won't get you concurrent resolving of deleted Terms -> docIDs, but I bet in practice that concurrency isn't necessary (ie the performance of a single thread resolving all buffered deletes is plenty fast). If the reopen time today is plenty fast, especially if you configure your writer to flush often, then I don't think we need incremental resolving of the deletions? > IndexWriter should immediately resolve deleted docs to docID in > near-real-time mode > ----------------------------------------------------------------------------------- > > Key: LUCENE-2047 > URL: https://issues.apache.org/jira/browse/LUCENE-2047 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2047.patch, LUCENE-2047.patch > > > Spinoff from LUCENE-1526. > When deleteDocuments(Term) is called, we currently always buffer the > Term and only later, when it's time to flush deletes, resolve to > docIDs. This is necessary because we don't in general hold > SegmentReaders open. > But, when IndexWriter is in NRT mode, we pool the readers, and so > deleting in the foreground is possible. > It's also beneficial, in that in can reduce the turnaround time when > reopening a new NRT reader by taking this resolution off the reopen > path. And if multiple threads are used to do the deletion, then we > gain concurrency, vs reopen which is not concurrent when flushing the > deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org