[
https://issues.apache.org/jira/browse/LUCENE-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599055#comment-14599055
]
Adrien Grand commented on LUCENE-6553:
--------------------------------------
Luceneutil on wikimedium10M again, but without deleted documents this time:
{code}
IntNRQ 9.57 (5.8%) 9.31 (6.6%)
-2.7% ( -14% - 10%)
Prefix3 253.58 (3.5%) 249.27 (3.4%)
-1.7% ( -8% - 5%)
LowTerm 695.13 (2.9%) 685.91 (2.9%)
-1.3% ( -6% - 4%)
Wildcard 51.13 (3.6%) 50.49 (4.3%)
-1.3% ( -8% - 6%)
LowSloppyPhrase 13.87 (5.3%) 13.71 (5.4%)
-1.1% ( -11% - 10%)
MedPhrase 99.70 (3.2%) 98.69 (4.3%)
-1.0% ( -8% - 6%)
Fuzzy1 86.60 (11.0%) 85.75 (11.0%)
-1.0% ( -20% - 23%)
Respell 103.93 (3.3%) 103.18 (3.5%)
-0.7% ( -7% - 6%)
HighSloppyPhrase 8.18 (5.6%) 8.13 (5.9%)
-0.7% ( -11% - 11%)
OrHighLow 55.24 (6.4%) 54.90 (6.9%)
-0.6% ( -13% - 13%)
HighPhrase 8.42 (5.9%) 8.37 (6.4%)
-0.6% ( -12% - 12%)
OrHighMed 19.64 (6.4%) 19.52 (7.2%)
-0.6% ( -13% - 13%)
LowPhrase 58.69 (2.2%) 58.34 (2.4%)
-0.6% ( -5% - 4%)
MedSloppyPhrase 43.44 (5.4%) 43.21 (5.3%)
-0.5% ( -10% - 10%)
OrHighHigh 39.31 (6.5%) 39.14 (6.9%)
-0.4% ( -12% - 13%)
AndHighLow 690.71 (5.1%) 688.77 (4.3%)
-0.3% ( -9% - 9%)
OrNotHighMed 153.25 (1.8%) 152.97 (1.9%)
-0.2% ( -3% - 3%)
AndHighHigh 65.10 (2.6%) 65.08 (3.2%)
-0.0% ( -5% - 5%)
OrNotHighHigh 46.47 (1.4%) 46.47 (1.9%)
-0.0% ( -3% - 3%)
AndHighMed 168.75 (2.3%) 168.79 (2.2%)
0.0% ( -4% - 4%)
MedSpanNear 61.15 (3.9%) 61.41 (3.5%)
0.4% ( -6% - 8%)
OrNotHighLow 1137.12 (4.0%) 1142.11 (3.5%)
0.4% ( -6% - 8%)
OrHighNotHigh 54.49 (1.7%) 54.74 (1.9%)
0.5% ( -3% - 4%)
LowSpanNear 14.95 (2.8%) 15.02 (2.9%)
0.5% ( -5% - 6%)
OrHighNotMed 41.44 (2.5%) 41.73 (2.6%)
0.7% ( -4% - 5%)
MedTerm 289.16 (3.5%) 292.24 (2.9%)
1.1% ( -5% - 7%)
OrHighNotLow 87.80 (3.3%) 88.86 (3.1%)
1.2% ( -5% - 7%)
HighTerm 81.86 (3.9%) 83.56 (3.5%)
2.1% ( -5% - 9%)
HighSpanNear 42.21 (3.5%) 43.33 (4.2%)
2.6% ( -4% - 10%)
Fuzzy2 58.86 (15.6%) 60.45 (9.4%)
2.7% ( -19% - 32%)
{code}
All differences look like noise to me?
> Simplify how we handle deleted docs in read APIs
> ------------------------------------------------
>
> Key: LUCENE-6553
> URL: https://issues.apache.org/jira/browse/LUCENE-6553
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: Trunk
>
> Attachments: LUCENE-6553.patch
>
>
> Today, all scorers and postings formats need to be able to handle deleted
> documents.
> I suspect that the reason is that we want to be able to make sure to not
> perform costly operations on documents that are deleted. For instance if you
> run a phrase query, reading positions on a document which is deleted is
> useless. I suspect this is also a source of inefficiencies since in some
> cases we apply deleted documents several times: for instance conjunctions
> apply deleted docs to every sub scorer.
> However, with the new two-phase iteration API, we have a way to make sure
> that we never run expensive operations on deleted documents: we could first
> iterate over the approximation, then check that the document is not deleted,
> and finally confirm the match. Since approximations are cheap, applying
> deleted docs after them would not be an issue.
> I would like to explore removing the "Bits acceptDocs" parameter from
> TermsEnum.postings, Weight.scorer, SpanWeight.getSpans and Weight.BulkScorer,
> and add it to BulkScorer.score. This way, bulk scorers would be the only API
> which would need to know how to apply deleted docs, which I think would be
> more manageable since we only have 3 or 4 impls. And DefaultBulkScorer would
> be implemented the way described above: first advance the approximation, then
> check deleted docs, then confirm the match, then collect. Of course that's
> only in the case the scorer supports approximations, if it does not, it means
> it is cheap so we can directly iterate the scorer and check deleted docs on
> top.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]