[ https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004835#comment-14004835 ]
Shai Erera commented on LUCENE-5693: ------------------------------------ Today we apply the deletes (update the bitset) when a Reader is being requested. At that point, we have a SegmentReader at hand and we can resolve the delete-by-Term/Query to the actual doc IDs ... how would we do that while the segment is flushed? How do we know which documents were associated with {{Term t}}, while it was sent as a delete? When I worked on LUCENE-5189 (NumericDocValues update), I had the same thought -- why flush the original numeric value when the document has already been updated? But I had the same issue - which documents were affected by the update Term. > don't write deleted documents on flush > -------------------------------------- > > Key: LUCENE-5693 > URL: https://issues.apache.org/jira/browse/LUCENE-5693 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > > When we flush a new segment, sometimes some documents are "born deleted", > e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed > documents. > We already compute the liveDocs on flush, but then we continue (wastefully) > to send those known-deleted documents to all Codec parts. > I started to implement this on LUCENE-5675 but it was too controversial. > Also, I expect typically the number of deleted docs is 0, or small, so not > writing "born deleted" docs won't be much of a win for most apps. Still it > seems silly to write them, consuming IO/CPU in the process, only to consume > more IO/CPU later for merging to re-delete them. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org