[ 
https://issues.apache.org/jira/browse/LUCENE-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988472#action_12988472
 ] 

Michael McCandless commented on LUCENE-2897:
--------------------------------------------

bq. I had to read this a few times, yes it's very elegant as we're skipping the 
postings that otherwise would be deleted immediately after flush, and we're 
reusing the terms map already in DWPT.

Well... I think we can't [easily] skip writing the postings, because could 
result in non-deterministic behavior (I put a comment on this in the patch).

If we did the flush w/ 2 passes (first pass to mark all del docs and 2nd to 
flush) then we could skip writing postings of docs that were deleted.  But I 
suspect that's too much cost on flush.

With a single pass, we'd end up writing some postings for the doc, but not all, 
depending on the order in which its terms arrived vs its deleted terms.

I mean, in practice, an app is gonna delete against ID field (typically) so if 
we "knew" that down deep here in Luceneland we could do the first pass only 
against that one field...

Also, merge is still going to have to apply del docs, since eg stored fields 
have written the deleted docs.

> apply delete-by-Term and docID immediately to newly flushed segments
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2897
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2897
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-2897.patch
>
>
> Spinoff from LUCENE-2324.
> When we flush deletes today, we keep them as buffered Term/Query/docIDs that 
> need to be deleted.  But, for a newly flushed segment (ie fresh out of the 
> DWPT), this is silly, because during flush we visit all terms and we know 
> their docIDs.  So it's more efficient to apply the deletes (for this one 
> segment) at that time.
> We still must buffer deletes for all prior segments, but these deletes don't 
> need to map to a docIDUpto anymore; ie we just need a Set.
> This issue should wait until LUCENE-1076 is in since that issue cuts over 
> buffered deletes to a transactional stream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to