[jira] [Commented] (LUCENE-5693) don't write deleted documents on flush

Shai Erera (JIRA) Wed, 21 May 2014 09:07:19 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004835#comment-14004835
 ]


Shai Erera commented on LUCENE-5693:
------------------------------------

Today we apply the deletes (update the bitset) when a Reader is being 
requested. At that point, we have a SegmentReader at hand and we can resolve 
the delete-by-Term/Query to the actual doc IDs ... how would we do that while 
the segment is flushed? How do we know which documents were associated with 
{{Term t}}, while it was sent as a delete?

When I worked on LUCENE-5189 (NumericDocValues update), I had the same thought 
-- why flush the original numeric value when the document has already been 
updated? But I had the same issue - which documents were affected by the update 
Term.

> don't write deleted documents on flush
> --------------------------------------
>
>                 Key: LUCENE-5693
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5693
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>
> When we flush a new segment, sometimes some documents are "born deleted", 
> e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed 
> documents.
> We already compute the liveDocs on flush, but then we continue (wastefully) 
> to send those known-deleted documents to all Codec parts.
> I started to implement this on LUCENE-5675 but it was too controversial.
> Also, I expect typically the number of deleted docs is 0, or small, so not 
> writing "born deleted" docs won't be much of a win for most apps.  Still it 
> seems silly to write them, consuming IO/CPU in the process, only to consume 
> more IO/CPU later for merging to re-delete them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5693) don't write deleted documents on flush

Reply via email to