[ https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922894#action_12922894 ]
Michael McCandless commented on LUCENE-2655: -------------------------------------------- bq. You're saying record a list of segments that existed at the time of flushing a DWPT's deletes? Actually, I think it's simpler! I think the DWPT just records the index of the last segment in the index, as of when it is created (or re-inited after it's been flushed). On flush of a given DWPT, its buffered deletes are recorded against that segment, and still carry over the lastSegmentIndex. This way, when we finally do resolve these deletes to docIDs, we 1) always apply the delete if segment <= lastSegmentIndex, or 2) the doc is in that segment and is <= the docID upto. I think this'd mean we can keep the docid-upto as local docIDs, which is nice (no globalizing/shifting-on-merge/flush needed). So, segments flushed in the current IW session will carry this private pool of pending deletes. But, pre-existing segments in the index don't need their own pool. Instead, when it's time to resolve the buffered deletes against them (because they are about to be merged), they must walk all of the per-segment pools, resolving the deletes from that pool if its segment index is <= the lastSegmentIndex of that pool. We should take care to efficiently handle dup'd terms, ie where the same del term is present in multiple pools. The most recent one "wins", and we should do only one delete (per segment) for that term. These per-segment delete pools must be updated on merge. EG if the lastSegmentIndex of a pool gets merged, that's fine, but then on merge commit we must move that lastSegmentIndex "backwards" to the last segment before the merge, because any deletes necessary within the segment will have been resolved already. When segments with del pools are merged, we obviously apply the deletes to the segments being merged, but, then, we have to coalesce those pools and move them into a single pool on the segment just before the merge. We could actually use a Set at this point since there is no more docid-upto for this pool (ie, it applies to all docs on that segment and in segments prior to it). So I think this is much simpler than I first thought! bq. Lets get that data structure mapped out to start on LUCENE-2680? +1 > Get deletes working in the realtime branch > ------------------------------------------ > > Key: LUCENE-2655 > URL: https://issues.apache.org/jira/browse/LUCENE-2655 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Realtime Branch > Reporter: Jason Rutherglen > Fix For: Realtime Branch > > Attachments: LUCENE-2655.patch > > > Deletes don't work anymore, a patch here will fix this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org