[ https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005450#comment-14005450 ]
Robert Muir commented on LUCENE-5693: ------------------------------------- {quote} I disagree: I think we discover places that are "relying" on deleted docs behavior, i.e. test bugs. When I did this on LUCENE-5675 there were only a few places that relied on deleted docs. {quote} That's not the complexity i'm concerned about. I'm talking about bugs in lucene itself because shit like the following happens: * various codec apis unable to cope with writing 0 doc segments because all the docs were deleted * various codec apis with corner case bugs because stuff like 'maxdoc' in segmentinfo they are fed is inconsistent with what they saw. * various index/search apis unable to cope with docid X appears in codec api Y but not codec api Z where its expected to exist. * slow O(n) passes thru indexwriter apis to recalculate and reshuffle ordinals and stuff like that. * corner case bugs like incorrect statistics. * additional complexity inside indexwriter/codecs to handle this, when just merging away would be better. So if we want to rename the issue to "as a special case, don't write deleted postings on flush" and remove the TODO about changing this for things like DV, then I'm fine. But otherwise, if this is intended to be a precedent of how things should work, then I strongly feel we should not do this. The additional complexity and corner cases are simply not worth it. > don't write deleted documents on flush > -------------------------------------- > > Key: LUCENE-5693 > URL: https://issues.apache.org/jira/browse/LUCENE-5693 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: LUCENE-5693.patch > > > When we flush a new segment, sometimes some documents are "born deleted", > e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed > documents. > We already compute the liveDocs on flush, but then we continue (wastefully) > to send those known-deleted documents to all Codec parts. > I started to implement this on LUCENE-5675 but it was too controversial. > Also, I expect typically the number of deleted docs is 0, or small, so not > writing "born deleted" docs won't be much of a win for most apps. Still it > seems silly to write them, consuming IO/CPU in the process, only to consume > more IO/CPU later for merging to re-delete them. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org