[
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-6198:
---------------------------------
Attachment: LUCENE-6198.patch
I did some more benchmarking and something that helped was to flatten clauses
in ConjunctionDISI. This typically means that {{+ "A B" +C}} is now
approximated as {{+A +B +C}} instead of {+(+A +B) +C}}. (see attached patch)
Here are results on wikibig:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
AndMedPhraseHighTerm 21.19 (6.1%) 19.98 (2.6%)
-5.7% ( -13% - 3%)
PKLookup 334.11 (2.1%) 334.82 (2.2%)
0.2% ( -4% - 4%)
AndHighPhraseHighTerm 11.64 (4.1%) 11.83 (2.4%)
1.6% ( -4% - 8%)
AndHighPhraseMedTerm 19.19 (2.5%) 21.99 (2.1%)
14.6% ( 9% - 19%)
AndMedPhraseMedTerm 58.27 (6.3%) 67.53 (6.6%)
15.9% ( 2% - 30%)
AndHighPhraseLowTerm 35.07 (5.6%) 42.46 (6.1%)
21.1% ( 8% - 34%)
AndMedPhraseLowTerm 93.39 (8.0%) 128.24 (13.3%)
37.3% ( 14% - 63%)
{noformat}
I was curious about the slow down on AndMedPhraseHighTerm. And actually it
seems to be tied to the fact that terms are not random. For instance one query
of this task is {{+"los angeles" +title}} which matches 30669 documents.
However the approximation is {{+los +angeles +title}} and matches 30711
documents, so approximation in this case only adds overhead.
> two phase intersection
> ----------------------
>
> Key: LUCENE-6198
> URL: https://issues.apache.org/jira/browse/LUCENE-6198
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Attachments: LUCENE-6198.patch, LUCENE-6198.patch, LUCENE-6198.patch,
> LUCENE-6198.patch, LUCENE-6198.patch, phrase_intersections.tasks
>
>
> Currently some scorers have to do a lot of per-document work to determine if
> a document is a match. The simplest example is a phrase scorer, but there are
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches
> all odd documents, another that is a phrase matching all even documents.
> Today this conjunction will be very expensive, because the zig-zag
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like
> a conjunction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]