[jira] [Commented] (LUCENE-8448) Slowdown of nested boolean queries after LUCENE-8060

Jim Ferenczi (JIRA) Tue, 14 Aug 2018 05:19:32 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579721#comment-16579721
 ]


Jim Ferenczi commented on LUCENE-8448:
--------------------------------------

We've tried several things with Adrien to optimize the nested boolean case. 
Currently boolean queries don't propagate the minimum score to their sub 
scorers. However in the first version of max score, the MaxScoreSumPropagator 
used to compute a minimum score per sub clause based on the sum of maximum 
scores of the other clauses. This optimization was removed at some point 
because it had a bad effect on simple boolean queries that contains terms 
clause only. A lot of things changed in the meantime (max scores are computed 
per blocks, ...) so we've tried to revive this optimization and applied it to 
all boolean scorers to run some benchmarks. We used wikimediumall and added the 
problematic queries from the nightly benchmark, the results are below:
{noformat}
TaskQPS lucene_baseline StdDevQPS lucene_candidate StdDev Pct diff
OrHighMed 28.08 (7.7%) 27.37 (8.8%) -2.5% ( -17% - 15%)
AndHighHigh 21.15 (9.5%) 20.99 (10.2%) -0.8% ( -18% - 20%)
AndHighMed 58.19 (8.8%) 57.80 (9.2%) -0.7% ( -17% - 18%)
OrHighHigh 11.92 (7.7%) 11.90 (9.2%) -0.1% ( -15% - 18%)
OrHighLow 259.35 (7.2%) 261.80 (8.7%) 0.9% ( -13% - 18%)
OrNotHighLow 582.99 (7.8%) 588.83 (9.8%) 1.0% ( -15% - 20%)
Fuzzy2 56.86 (6.8%) 57.67 (8.2%) 1.4% ( -12% - 17%)
AndHighLow 340.56 (7.4%) 345.60 (9.7%) 1.5% ( -14% - 20%)
Fuzzy1 53.38 (6.9%) 54.22 (8.6%) 1.6% ( -13% - 18%)
Wildcard 17.41 (8.3%) 17.73 (9.4%) 1.8% ( -14% - 21%)
Prefix3 22.16 (8.4%) 22.57 (9.7%) 1.9% ( -14% - 21%)
OrNotHighMed 803.13 (8.2%) 818.85 (9.8%) 2.0% ( -14% - 21%)
HighTerm 1333.98 (8.1%) 1361.12 (10.1%) 2.0% ( -14% - 22%)
OrNotHighHigh 790.52 (7.7%) 806.66 (9.8%) 2.0% ( -14% - 21%)
OrHighNotLow 960.80 (8.8%) 981.56 (10.1%) 2.2% ( -15% - 22%)
Respell 42.76 (7.7%) 43.71 (9.6%) 2.2% ( -13% - 21%)
MedTerm 1568.86 (8.1%) 1603.71 (10.1%) 2.2% ( -14% - 22%)
OrHighNotMed 999.26 (8.5%) 1022.44 (9.8%) 2.3% ( -14% - 22%)
OrHighNotHigh 791.65 (8.5%) 811.37 (10.4%) 2.5% ( -15% - 23%)
LowTerm 1611.84 (8.5%) 1660.90 (10.1%) 3.0% ( -14% - 23%)
AndMedOrHighHigh 5.53 (6.6%) 8.94 (12.8%) 61.6% ( 39% - 86%)
AndHighOrMedMed 8.45 (7.3%) 29.90 (33.6%) 253.8% ( 198% - 318%)
AndHighOrMedLow 13.68 (7.4%) 58.86 (37.4%) 330.2% ( 265% - 405%)
AndMedOrHighLow 2.01 (6.1%) 24.43 (92.6%) 1118.1% ( 960% - 1295%){noformat}

The AndMedOrHighHigh and AndHighOrMedMed have a nice speedup, I also created 
AndHighOrMedLow and AndMedOrHighLow to show other types of speed up for nested 
boolean queries. 
We also tested other improvements but they didn't work as well as this one and 
would deserve specific issues (that I'll open in a follow up).

> Slowdown of nested boolean queries after LUCENE-8060
> ----------------------------------------------------
>
>                 Key: LUCENE-8448
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8448
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8448.patch
>
>
> Mike's nightly benchmarks revealed that disabling hit counts slowed down 
> nested boolean queries 
> http://people.apache.org/~mikemccand/lucenebench/AndHighOrMedMed.html 
> http://people.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html.
> We are probably not propagating max scores and/or blocks efficiently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8448) Slowdown of nested boolean queries after LUCENE-8060

Reply via email to