[ https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777436#action_12777436 ]
Michael McCandless commented on LUCENE-2047: -------------------------------------------- {quote} we'd need to prevent the deletion of the SR's files we're deleting from, even if that SR is no longer live. {quote} It's strange that anything here is needed, because, when you check a reader out from the pool, it's incRef'd, which should mean the files need no protection. Something strange is up... could it be that when you checkout that reader to do deletions, it wasn't already open, and then on trying to open it, its files were already deleted? (In which case, that segment has been merged away, and, the merge has committed, ie already carried over all deletes, and so you should instead be deleting against that merged segment). So I think the sync(IW) is in fact necessary? Note that the current approach (deferring resolving term -> docIDs until flush time) aiso sync(IW)'d, so we're not really changing that, here. Though I agree it would be nice to not have to sync(IW). Really what we need to sync on is "any merge that is merging this segment away and now wants to commit". That's actually a very narrow event so someday (separate issue) if we could refine the sync'ing to that, it should be a good net throughput improvement for updateDocument. {quote} What happens to documents that need to be deleted but are still in the RAM buffer? {quote} Ahh, yes. We must still buffer for this case, and resolve these deletes against the newly flushed segment. I think we need a separate buffer that tracks pending delete terms only against the RAM buffer? Also, instead of actually setting the bits in SR's deletedDocs, I think you should buffer the deleted docIDs into DW's deletesInRAM.docIDs? Ie, we do the resolution of Term/Query -> docID, but buffer the docIDs we resolved to. This is necessary for correctness in exceptional situations, eg if you do a bunch of updateDocuments, then DW hits an aborting exception (meaning its RAM buffer may be corrupt) then DW currently discards the RAM buffer, but, leaves previously flushed segments intact, so that if you then commit, you have a consistent index. Ie, in that situation, we don't want the docs deleted by updateDocument calls to be committed to the index, so we need to buffer them. > IndexWriter should immediately resolve deleted docs to docID in > near-real-time mode > ----------------------------------------------------------------------------------- > > Key: LUCENE-2047 > URL: https://issues.apache.org/jira/browse/LUCENE-2047 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2047.patch, LUCENE-2047.patch > > > Spinoff from LUCENE-1526. > When deleteDocuments(Term) is called, we currently always buffer the > Term and only later, when it's time to flush deletes, resolve to > docIDs. This is necessary because we don't in general hold > SegmentReaders open. > But, when IndexWriter is in NRT mode, we pool the readers, and so > deleting in the foreground is possible. > It's also beneficial, in that in can reduce the turnaround time when > reopening a new NRT reader by taking this resolution off the reopen > path. And if multiple threads are used to do the deletion, then we > gain concurrency, vs reopen which is not concurrent when flushing the > deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org