zacharymorn commented on a change in pull request #113: URL: https://github.com/apache/lucene/pull/113#discussion_r631568912
########## File path: lucene/core/src/java/org/apache/lucene/search/BMMBulkScorer.java ########## @@ -0,0 +1,317 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import static org.apache.lucene.search.ScorerUtil.costWithMinShouldMatch; + +import java.io.IOException; +import java.util.*; +import org.apache.lucene.util.Bits; + +/** BulkScorer that leverages BMM algorithm within interval (min, max) */ +public class BMMBulkScorer extends BulkScorer { + private List<Scorer> scorers; + private DisiWrapper[] allScorers; + private Weight weight; + private ScoreMode scoreMode; + private int scalingFactor; + private long cost; + private static final int FIXED_WINDOW_SIZE = 2048; Review comment: I also ran wikibigall for the above changes as well following the suggestions from https://github.com/apache/lucene/pull/101#issuecomment-837909869, and got the following results: wikibigall run 1 ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy1 51.08 (9.7%) 44.15 (11.5%) -13.6% ( -31% - 8%) 0.000 Fuzzy2 51.90 (12.0%) 48.39 (10.3%) -6.8% ( -25% - 17%) 0.056 TermDayOfYearSort 160.58 (12.1%) 156.99 (11.2%) -2.2% ( -22% - 23%) 0.542 TermMonthSort 58.61 (8.4%) 57.57 (8.0%) -1.8% ( -16% - 15%) 0.494 TermDTSort 104.08 (11.4%) 102.37 (9.4%) -1.6% ( -20% - 21%) 0.619 TermTitleSort 104.99 (8.3%) 103.29 (7.6%) -1.6% ( -16% - 15%) 0.519 AndHighOrMedMed 33.52 (3.1%) 33.02 (2.8%) -1.5% ( -7% - 4%) 0.114 AndHighHigh 18.08 (4.6%) 17.87 (4.1%) -1.1% ( -9% - 7%) 0.406 TermBGroup1M 14.14 (4.1%) 14.03 (3.5%) -0.8% ( -8% - 7%) 0.491 TermDateFacets 7.58 (5.5%) 7.53 (6.1%) -0.7% ( -11% - 11%) 0.714 Phrase 10.38 (1.8%) 10.32 (2.2%) -0.6% ( -4% - 3%) 0.359 AndHighMed 82.33 (4.0%) 81.87 (3.9%) -0.6% ( -8% - 7%) 0.655 SloppyPhrase 2.32 (8.3%) 2.31 (9.9%) -0.5% ( -17% - 19%) 0.855 TermGroup100 34.52 (3.7%) 34.36 (3.0%) -0.5% ( -6% - 6%) 0.650 TermBGroup1M1P 43.50 (3.8%) 43.30 (4.0%) -0.5% ( -7% - 7%) 0.700 AndMedOrHighHigh 25.62 (3.4%) 25.51 (3.1%) -0.4% ( -6% - 6%) 0.666 TermGroup1M 15.43 (3.0%) 15.37 (2.7%) -0.4% ( -5% - 5%) 0.668 VectorSearch 823.98 (1.9%) 820.96 (2.6%) -0.4% ( -4% - 4%) 0.616 PKLookup 210.69 (2.6%) 210.23 (2.5%) -0.2% ( -5% - 4%) 0.782 BrowseMonthSSDVFacets 18.90 (0.8%) 18.87 (0.9%) -0.2% ( -1% - 1%) 0.574 BrowseDayOfYearTaxoFacets 7.14 (5.3%) 7.14 (5.8%) -0.1% ( -10% - 11%) 0.943 Wildcard 38.83 (2.4%) 38.79 (2.5%) -0.1% ( -4% - 4%) 0.881 TermGroup10K 18.47 (2.9%) 18.45 (2.5%) -0.1% ( -5% - 5%) 0.910 SpanNear 4.76 (1.4%) 4.76 (1.3%) -0.1% ( -2% - 2%) 0.885 Prefix3 173.23 (6.8%) 173.13 (6.8%) -0.1% ( -12% - 14%) 0.978 BrowseDateTaxoFacets 7.46 (5.5%) 7.46 (6.1%) -0.1% ( -10% - 12%) 0.976 BrowseMonthTaxoFacets 8.27 (5.7%) 8.27 (6.4%) -0.0% ( -11% - 12%) 0.986 Respell 41.11 (2.8%) 41.12 (2.6%) 0.0% ( -5% - 5%) 0.972 BrowseDayOfYearSSDVFacets 17.13 (1.8%) 17.14 (1.7%) 0.1% ( -3% - 3%) 0.887 IntNRQ 267.98 (2.2%) 268.69 (2.5%) 0.3% ( -4% - 5%) 0.721 IntervalsOrdered 3.79 (2.1%) 3.81 (2.4%) 0.5% ( -3% - 5%) 0.448 Term 1046.89 (7.5%) 1067.03 (7.0%) 1.9% ( -11% - 17%) 0.401 OrHighMed 34.43 (3.2%) 37.66 (5.5%) 9.4% ( 0% - 18%) 0.000 OrHighHigh 16.93 (3.7%) 25.19 (4.6%) 48.8% ( 39% - 59%) 0.000 ``` wikibigall run 2 ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy1 50.87 (9.8%) 47.08 (13.8%) -7.5% ( -28% - 17%) 0.049 Fuzzy2 30.31 (5.6%) 28.38 (9.6%) -6.4% ( -20% - 9%) 0.011 TermMonthSort 59.51 (12.2%) 58.67 (9.4%) -1.4% ( -20% - 23%) 0.683 TermDTSort 172.78 (11.5%) 170.44 (10.1%) -1.4% ( -20% - 22%) 0.692 AndMedOrHighHigh 9.65 (3.1%) 9.55 (2.8%) -1.1% ( -6% - 4%) 0.233 TermTitleSort 59.25 (12.2%) 58.60 (9.8%) -1.1% ( -20% - 23%) 0.754 TermDateFacets 8.18 (7.5%) 8.13 (7.7%) -0.6% ( -14% - 15%) 0.789 Respell 46.60 (3.8%) 46.33 (3.6%) -0.6% ( -7% - 7%) 0.628 IntervalsOrdered 3.81 (2.7%) 3.80 (2.8%) -0.4% ( -5% - 5%) 0.674 AndHighOrMedMed 24.00 (3.0%) 23.94 (3.4%) -0.3% ( -6% - 6%) 0.792 AndHighMed 59.47 (3.1%) 59.34 (3.6%) -0.2% ( -6% - 6%) 0.837 TermGroup1M 22.27 (3.4%) 22.23 (3.5%) -0.1% ( -6% - 7%) 0.895 BrowseDateTaxoFacets 7.28 (7.7%) 7.27 (7.8%) -0.1% ( -14% - 16%) 0.956 BrowseDayOfYearTaxoFacets 6.97 (7.5%) 6.96 (7.5%) -0.1% ( -14% - 16%) 0.958 BrowseMonthTaxoFacets 8.08 (7.7%) 8.07 (7.9%) -0.1% ( -14% - 16%) 0.962 AndHighHigh 64.73 (2.8%) 64.67 (3.7%) -0.1% ( -6% - 6%) 0.921 Wildcard 70.06 (3.1%) 70.00 (3.2%) -0.1% ( -6% - 6%) 0.924 BrowseMonthSSDVFacets 18.76 (0.9%) 18.77 (0.9%) 0.0% ( -1% - 1%) 0.919 Phrase 20.88 (3.8%) 20.90 (3.2%) 0.1% ( -6% - 7%) 0.936 TermGroup10K 12.15 (3.7%) 12.16 (4.0%) 0.1% ( -7% - 8%) 0.931 TermBGroup1M1P 15.29 (5.1%) 15.31 (4.6%) 0.1% ( -9% - 10%) 0.936 Prefix3 32.94 (2.9%) 32.99 (2.9%) 0.1% ( -5% - 6%) 0.872 BrowseDayOfYearSSDVFacets 17.10 (1.7%) 17.13 (1.7%) 0.2% ( -3% - 3%) 0.768 TermGroup100 34.25 (3.8%) 34.34 (3.9%) 0.3% ( -7% - 8%) 0.829 SloppyPhrase 2.82 (7.5%) 2.83 (7.4%) 0.3% ( -13% - 16%) 0.900 TermDayOfYearSort 45.78 (11.8%) 45.93 (10.6%) 0.3% ( -19% - 25%) 0.926 SpanNear 10.00 (1.2%) 10.05 (1.2%) 0.4% ( -1% - 2%) 0.253 IntNRQ 108.69 (24.1%) 109.25 (23.7%) 0.5% ( -38% - 63%) 0.945 TermBGroup1M 11.95 (4.5%) 12.03 (5.2%) 0.7% ( -8% - 10%) 0.661 PKLookup 201.05 (6.0%) 203.48 (4.0%) 1.2% ( -8% - 11%) 0.451 Term 667.45 (5.8%) 683.87 (7.3%) 2.5% ( -10% - 16%) 0.240 VectorSearch 989.57 (5.4%) 1021.23 (5.0%) 3.2% ( -6% - 14%) 0.051 OrHighMed 58.35 (3.9%) 69.23 (5.8%) 18.6% ( 8% - 29%) 0.000 OrHighHigh 11.04 (3.4%) 16.84 (6.2%) 52.5% ( 41% - 64%) 0.000 ``` wikibigall run 3 ``` TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value Fuzzy1 56.20 (11.1%) 49.60 (12.0%) -11.7% ( -31% - 12%) 0.001 TermMonthSort 61.43 (11.3%) 57.85 (14.1%) -5.8% ( -28% - 21%) 0.148 TermTitleSort 109.97 (11.2%) 103.85 (14.1%) -5.6% ( -27% - 22%) 0.167 TermDTSort 160.77 (10.8%) 151.92 (13.5%) -5.5% ( -26% - 21%) 0.156 TermDayOfYearSort 55.50 (7.1%) 52.92 (15.5%) -4.6% ( -25% - 19%) 0.222 TermGroup10K 10.30 (4.6%) 10.02 (7.4%) -2.7% ( -14% - 9%) 0.160 Term 1037.48 (5.2%) 1010.63 (7.6%) -2.6% ( -14% - 10%) 0.210 TermBGroup1M 21.54 (5.0%) 21.00 (7.4%) -2.5% ( -14% - 10%) 0.212 TermGroup100 18.89 (4.4%) 18.46 (7.8%) -2.3% ( -13% - 10%) 0.255 TermDateFacets 10.29 (9.2%) 10.11 (9.5%) -1.8% ( -18% - 18%) 0.536 TermBGroup1M1P 43.52 (4.9%) 42.88 (5.6%) -1.5% ( -11% - 9%) 0.373 Fuzzy2 56.25 (13.4%) 55.53 (12.5%) -1.3% ( -24% - 28%) 0.754 TermGroup1M 22.31 (3.8%) 22.04 (5.2%) -1.2% ( -9% - 8%) 0.389 AndMedOrHighHigh 28.60 (2.5%) 28.31 (2.7%) -1.0% ( -6% - 4%) 0.222 Phrase 59.81 (2.9%) 59.43 (3.1%) -0.6% ( -6% - 5%) 0.498 PKLookup 205.40 (3.8%) 204.10 (4.9%) -0.6% ( -8% - 8%) 0.648 VectorSearch 1033.68 (4.0%) 1027.88 (4.3%) -0.6% ( -8% - 8%) 0.670 BrowseDateTaxoFacets 7.27 (6.9%) 7.24 (7.0%) -0.4% ( -13% - 14%) 0.859 BrowseDayOfYearTaxoFacets 6.97 (6.6%) 6.94 (6.8%) -0.4% ( -12% - 13%) 0.854 SloppyPhrase 18.29 (2.0%) 18.22 (2.8%) -0.4% ( -5% - 4%) 0.612 BrowseMonthTaxoFacets 8.05 (6.9%) 8.02 (7.0%) -0.3% ( -13% - 14%) 0.891 AndHighOrMedMed 23.88 (2.7%) 23.83 (2.3%) -0.2% ( -5% - 4%) 0.774 IntervalsOrdered 3.83 (2.5%) 3.83 (2.6%) -0.1% ( -5% - 5%) 0.862 IntNRQ 123.08 (14.8%) 122.93 (15.0%) -0.1% ( -26% - 34%) 0.979 Wildcard 58.03 (2.7%) 57.97 (3.1%) -0.1% ( -5% - 5%) 0.901 BrowseDayOfYearSSDVFacets 16.93 (1.7%) 16.91 (1.5%) -0.1% ( -3% - 3%) 0.851 Prefix3 165.67 (10.5%) 165.54 (9.6%) -0.1% ( -18% - 22%) 0.980 SpanNear 4.76 (1.3%) 4.77 (1.0%) 0.0% ( -2% - 2%) 0.915 BrowseMonthSSDVFacets 18.78 (1.4%) 18.80 (1.3%) 0.1% ( -2% - 2%) 0.815 Respell 47.08 (4.1%) 47.19 (4.1%) 0.2% ( -7% - 8%) 0.851 AndHighHigh 17.36 (3.4%) 17.50 (3.1%) 0.8% ( -5% - 7%) 0.435 AndHighMed 32.21 (3.6%) 32.50 (3.2%) 0.9% ( -5% - 7%) 0.406 OrHighMed 33.59 (3.2%) 37.09 (3.8%) 10.4% ( 3% - 18%) 0.000 OrHighHigh 10.82 (3.7%) 17.08 (4.1%) 57.8% ( 48% - 68%) 0.000 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org