[
https://issues.apache.org/jira/browse/LUCENE-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601205#comment-14601205
]
Adrien Grand commented on LUCENE-6553:
--------------------------------------
I just committed LUCENE-6601 so I'll commit this change to the 5.x branch as
well.
> Simplify how we handle deleted docs in read APIs
> ------------------------------------------------
>
> Key: LUCENE-6553
> URL: https://issues.apache.org/jira/browse/LUCENE-6553
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: 5.3
>
> Attachments: LUCENE-6553.patch
>
>
> Today, all scorers and postings formats need to be able to handle deleted
> documents.
> I suspect that the reason is that we want to be able to make sure to not
> perform costly operations on documents that are deleted. For instance if you
> run a phrase query, reading positions on a document which is deleted is
> useless. I suspect this is also a source of inefficiencies since in some
> cases we apply deleted documents several times: for instance conjunctions
> apply deleted docs to every sub scorer.
> However, with the new two-phase iteration API, we have a way to make sure
> that we never run expensive operations on deleted documents: we could first
> iterate over the approximation, then check that the document is not deleted,
> and finally confirm the match. Since approximations are cheap, applying
> deleted docs after them would not be an issue.
> I would like to explore removing the "Bits acceptDocs" parameter from
> TermsEnum.postings, Weight.scorer, SpanWeight.getSpans and Weight.BulkScorer,
> and add it to BulkScorer.score. This way, bulk scorers would be the only API
> which would need to know how to apply deleted docs, which I think would be
> more manageable since we only have 3 or 4 impls. And DefaultBulkScorer would
> be implemented the way described above: first advance the approximation, then
> check deleted docs, then confirm the match, then collect. Of course that's
> only in the case the scorer supports approximations, if it does not, it means
> it is cheap so we can directly iterate the scorer and check deleted docs on
> top.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]