[ 
https://issues.apache.org/jira/browse/LUCENE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12922894#action_12922894
 ] 

Michael McCandless commented on LUCENE-2655:
--------------------------------------------

bq. You're saying record a list of segments that existed at the time of 
flushing a DWPT's deletes?

Actually, I think it's simpler!  I think the DWPT just records the index of the 
last segment in the index, as of when it is created (or re-inited after it's 
been flushed).

On flush of a given DWPT, its buffered deletes are recorded against that 
segment, and still carry over the lastSegmentIndex.  This way, when we finally 
do resolve these deletes to docIDs, we 1) always apply the delete if segment <= 
lastSegmentIndex, or 2) the doc is in that segment and is <= the docID upto.  I 
think this'd mean we can keep the docid-upto as local docIDs, which is nice (no 
globalizing/shifting-on-merge/flush needed).

So, segments flushed in the current IW session will carry this private pool of 
pending deletes.  But, pre-existing segments in the index don't need their own 
pool.  Instead, when it's time to resolve the buffered deletes against them 
(because they are about to be merged), they must walk all of the per-segment 
pools, resolving the deletes from that pool if its segment index is <= the 
lastSegmentIndex of that pool.  We should take care to efficiently handle dup'd 
terms, ie where the same del term is present in multiple pools.  The most 
recent one "wins", and we should do only one delete (per segment) for that term.

These per-segment delete pools must be updated on merge.  EG if the 
lastSegmentIndex of a pool gets merged, that's fine, but then on merge commit 
we must move that lastSegmentIndex "backwards" to the last segment before the 
merge, because any deletes necessary within the segment will have been resolved 
already.

When segments with del pools are merged, we obviously apply the deletes to the 
segments being merged, but, then, we have to coalesce those pools and move them 
into a single pool on the segment just before the merge.  We could actually use 
a Set at this point since there is no more docid-upto for this pool (ie, it 
applies to all docs on that segment and in segments prior to it).

So I think this is much simpler than I first thought!

bq. Lets get that data structure mapped out to start on LUCENE-2680?

+1

> Get deletes working in the realtime branch
> ------------------------------------------
>
>                 Key: LUCENE-2655
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2655
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2655.patch
>
>
> Deletes don't work anymore, a patch here will fix this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to