[
https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004906#comment-14004906
]
Michael McCandless commented on LUCENE-5693:
--------------------------------------------
bq. This only makes sense for postings though.
Right, postings is much easier than doc values. But postings are also the most
costly to merge.
bq. By writing them some places and not writing them other places, we open the
possibility of extremely confusing corner cases and bugs.
I disagree: I think we discover places that are "relying" on deleted docs
behavior, i.e. test bugs. When I did this on LUCENE-5675 there were only a few
places that relied on deleted docs.
> don't write deleted documents on flush
> --------------------------------------
>
> Key: LUCENE-5693
> URL: https://issues.apache.org/jira/browse/LUCENE-5693
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
>
> When we flush a new segment, sometimes some documents are "born deleted",
> e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed
> documents.
> We already compute the liveDocs on flush, but then we continue (wastefully)
> to send those known-deleted documents to all Codec parts.
> I started to implement this on LUCENE-5675 but it was too controversial.
> Also, I expect typically the number of deleted docs is 0, or small, so not
> writing "born deleted" docs won't be much of a win for most apps. Still it
> seems silly to write them, consuming IO/CPU in the process, only to consume
> more IO/CPU later for merging to re-delete them.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]