[
https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-6198:
---------------------------------
Attachment: phrase_intersections.tasks
I built some tasks for intersections of phrases with terms and ran luceneutil
on it to validate that it does indeed speed up such queries:
{noformat}
TaskQPS baseline StdDev QPS patch StdDev
Pct diff
PKLookup 247.13 (2.0%) 248.14 (1.9%)
0.4% ( -3% - 4%)
AndMedPhraseLowTerm 13.74 (0.7%) 14.67 (2.8%)
6.7% ( 3% - 10%)
AndHighPhraseHighTerm 6.03 (0.9%) 6.45 (0.8%)
7.0% ( 5% - 8%)
AndMedPhraseHighTerm 45.62 (2.6%) 49.62 (1.7%)
8.8% ( 4% - 13%)
AndMedPhraseMedTerm 49.14 (2.8%) 58.40 (5.7%)
18.8% ( 10% - 28%)
AndHighPhraseMedTerm 11.81 (1.5%) 15.02 (2.2%)
27.1% ( 23% - 31%)
AndHighPhraseLowTerm 31.43 (3.5%) 41.39 (6.2%)
31.7% ( 21% - 42%)
{noformat}
> two phase intersection
> ----------------------
>
> Key: LUCENE-6198
> URL: https://issues.apache.org/jira/browse/LUCENE-6198
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Robert Muir
> Attachments: LUCENE-6198.patch, LUCENE-6198.patch, LUCENE-6198.patch,
> LUCENE-6198.patch, phrase_intersections.tasks
>
>
> Currently some scorers have to do a lot of per-document work to determine if
> a document is a match. The simplest example is a phrase scorer, but there are
> others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches
> all odd documents, another that is a phrase matching all even documents.
> Today this conjunction will be very expensive, because the zig-zag
> intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like
> a conjunction.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]