[jira] [Commented] (LUCENE-6553) Simplify how we handle deleted docs in read APIs

David Smiley (JIRA) Tue, 23 Jun 2015 20:48:04 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598824#comment-14598824
 ]


David Smiley commented on LUCENE-6553:
--------------------------------------

+1 LGTM.  

Wow the patch is a doozy.  At least mostly it's changing parameters at call 
sites.  For the most part the patch seems to reduce the lines of code (a good 
thing), particularly at the low-level postings format parts, so I think that's 
good.

bq. One thing I am unsure about is whether LeafReader.postings should still 
apply deleted docs or not.

Your chosen approach makes sense to me.  If it were to do it, it takes away 
control of wether they should be filtered or not.

> Simplify how we handle deleted docs in read APIs
> ------------------------------------------------
>
>                 Key: LUCENE-6553
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6553
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: Trunk
>
>         Attachments: LUCENE-6553.patch
>
>
> Today, all scorers and postings formats need to be able to handle deleted 
> documents.
> I suspect that the reason is that we want to be able to make sure to not 
> perform costly operations on documents that are deleted. For instance if you 
> run a phrase query, reading positions on a document which is deleted is 
> useless. I suspect this is also a source of inefficiencies since in some 
> cases we apply deleted documents several times: for instance conjunctions 
> apply deleted docs to every sub scorer.
> However, with the new two-phase iteration API, we have a way to make sure 
> that we never run expensive operations on deleted documents: we could first 
> iterate over the approximation, then check that the document is not deleted, 
> and finally confirm the match. Since approximations are cheap, applying 
> deleted docs after them would not be an issue.
> I would like to explore removing the "Bits acceptDocs" parameter from 
> TermsEnum.postings, Weight.scorer, SpanWeight.getSpans and Weight.BulkScorer, 
> and add it to BulkScorer.score. This way, bulk scorers would be the only API 
> which would need to know how to apply deleted docs, which I think would be 
> more manageable since we only have 3 or 4 impls. And DefaultBulkScorer would 
> be implemented the way described above: first advance the approximation, then 
> check deleted docs, then confirm the match, then collect. Of course that's 
> only in the case the scorer supports approximations, if it does not, it means 
> it is cheap so we can directly iterate the scorer and check deleted docs on 
> top.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6553) Simplify how we handle deleted docs in read APIs

Reply via email to