[ https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777638#action_12777638 ]
Jason Rutherglen commented on LUCENE-2047: ------------------------------------------ bq. It's strange that anything here is needed I was obtaining the segment infos synced, had a small block of unsynced code, then synced obtaining the sometimes defunct readers. Fixed that part, then the errors went away! bq. the sync(IW) is in fact necessary? I'm hoping we can do the deletes unsynced, which will make this patch a net performance gain because we're allowing multiple threads to delete concurrently (whereas today we're performing them synced at flush time, i.e. the current patch is merely shifting the term/query lookup cost from flush to deleteDocument). bq. buffer the deleted docIDs into DW's deletesInRAM.docIDs I'll need to step through this, as it's a little strange to me how DW knows the doc id to cache for a particular SR, i.e. how are they mapped to an SR? Oh there's the DW.remapDeletes method? Hmm... Couldn't we save off a per SR BV for the update doc rollback case, merging the special updated doc BV into the SR's deletes on successful flush, throwing them away on failure? Memory is less of a concern with the paged BV from the pending LUCENE-1526 patch. On a delete by query with many hits, I'm concerned about storing too many doc id Integers in BufferedDeletes. Without syncing, new deletes could arrive, and we'd need to queue them, and apply them to new segments, or newly merged segments because we're not locking the segments. Otherwise some deletes could be lost. A possible solution is, deleteDocument would synchronously add the delete query/term to a queue per SR and return. Asynchronously (i.e. in background threads) the deletes could be applied. Merging would aggregate the incoming SR's queued deletes (as they haven't been applied yet) into the merged reader's delete queue. On flush we'd wait for these queued deletes to be applied. After flush, the queues would be clear and we'd start over. And because the delete queue is per reader, it would be thrown away with the closed reader. > IndexWriter should immediately resolve deleted docs to docID in > near-real-time mode > ----------------------------------------------------------------------------------- > > Key: LUCENE-2047 > URL: https://issues.apache.org/jira/browse/LUCENE-2047 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2047.patch, LUCENE-2047.patch > > > Spinoff from LUCENE-1526. > When deleteDocuments(Term) is called, we currently always buffer the > Term and only later, when it's time to flush deletes, resolve to > docIDs. This is necessary because we don't in general hold > SegmentReaders open. > But, when IndexWriter is in NRT mode, we pool the readers, and so > deleting in the foreground is possible. > It's also beneficial, in that in can reduce the turnaround time when > reopening a new NRT reader by taking this resolution off the reopen > path. And if multiple threads are used to do the deletion, then we > gain concurrency, vs reopen which is not concurrent when flushing the > deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org