[ https://issues.apache.org/jira/browse/LUCENE-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579721#comment-16579721 ]
Jim Ferenczi commented on LUCENE-8448: -------------------------------------- We've tried several things with Adrien to optimize the nested boolean case. Currently boolean queries don't propagate the minimum score to their sub scorers. However in the first version of max score, the MaxScoreSumPropagator used to compute a minimum score per sub clause based on the sum of maximum scores of the other clauses. This optimization was removed at some point because it had a bad effect on simple boolean queries that contains terms clause only. A lot of things changed in the meantime (max scores are computed per blocks, ...) so we've tried to revive this optimization and applied it to all boolean scorers to run some benchmarks. We used wikimediumall and added the problematic queries from the nightly benchmark, the results are below: {noformat} TaskQPS lucene_baseline StdDevQPS lucene_candidate StdDev Pct diff OrHighMed 28.08 (7.7%) 27.37 (8.8%) -2.5% ( -17% - 15%) AndHighHigh 21.15 (9.5%) 20.99 (10.2%) -0.8% ( -18% - 20%) AndHighMed 58.19 (8.8%) 57.80 (9.2%) -0.7% ( -17% - 18%) OrHighHigh 11.92 (7.7%) 11.90 (9.2%) -0.1% ( -15% - 18%) OrHighLow 259.35 (7.2%) 261.80 (8.7%) 0.9% ( -13% - 18%) OrNotHighLow 582.99 (7.8%) 588.83 (9.8%) 1.0% ( -15% - 20%) Fuzzy2 56.86 (6.8%) 57.67 (8.2%) 1.4% ( -12% - 17%) AndHighLow 340.56 (7.4%) 345.60 (9.7%) 1.5% ( -14% - 20%) Fuzzy1 53.38 (6.9%) 54.22 (8.6%) 1.6% ( -13% - 18%) Wildcard 17.41 (8.3%) 17.73 (9.4%) 1.8% ( -14% - 21%) Prefix3 22.16 (8.4%) 22.57 (9.7%) 1.9% ( -14% - 21%) OrNotHighMed 803.13 (8.2%) 818.85 (9.8%) 2.0% ( -14% - 21%) HighTerm 1333.98 (8.1%) 1361.12 (10.1%) 2.0% ( -14% - 22%) OrNotHighHigh 790.52 (7.7%) 806.66 (9.8%) 2.0% ( -14% - 21%) OrHighNotLow 960.80 (8.8%) 981.56 (10.1%) 2.2% ( -15% - 22%) Respell 42.76 (7.7%) 43.71 (9.6%) 2.2% ( -13% - 21%) MedTerm 1568.86 (8.1%) 1603.71 (10.1%) 2.2% ( -14% - 22%) OrHighNotMed 999.26 (8.5%) 1022.44 (9.8%) 2.3% ( -14% - 22%) OrHighNotHigh 791.65 (8.5%) 811.37 (10.4%) 2.5% ( -15% - 23%) LowTerm 1611.84 (8.5%) 1660.90 (10.1%) 3.0% ( -14% - 23%) AndMedOrHighHigh 5.53 (6.6%) 8.94 (12.8%) 61.6% ( 39% - 86%) AndHighOrMedMed 8.45 (7.3%) 29.90 (33.6%) 253.8% ( 198% - 318%) AndHighOrMedLow 13.68 (7.4%) 58.86 (37.4%) 330.2% ( 265% - 405%) AndMedOrHighLow 2.01 (6.1%) 24.43 (92.6%) 1118.1% ( 960% - 1295%){noformat} The AndMedOrHighHigh and AndHighOrMedMed have a nice speedup, I also created AndHighOrMedLow and AndMedOrHighLow to show other types of speed up for nested boolean queries. We also tested other improvements but they didn't work as well as this one and would deserve specific issues (that I'll open in a follow up). > Slowdown of nested boolean queries after LUCENE-8060 > ---------------------------------------------------- > > Key: LUCENE-8448 > URL: https://issues.apache.org/jira/browse/LUCENE-8448 > Project: Lucene - Core > Issue Type: Task > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-8448.patch > > > Mike's nightly benchmarks revealed that disabling hit counts slowed down > nested boolean queries > http://people.apache.org/~mikemccand/lucenebench/AndHighOrMedMed.html > http://people.apache.org/~mikemccand/lucenebench/AndMedOrHighHigh.html. > We are probably not propagating max scores and/or blocks efficiently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org