[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016491#comment-14016491 ]
Da Huang edited comment on LUCENE-4396 at 6/3/14 1:24 PM: ---------------------------------------------------------- A patch based on lucene github mirror commit cf10341825ff6bd1662dd48c51926bc51d751ce5. I use a bitset to skip required docs when scaning optional and prohibited docs. The perf. comparison is at the bottom. Besides, I build a new tasks file the test the perf. and I discover that BNS optimize the "+a -b -c -d ..." case a lot, when "b c d ..." hits many docs. {code} BNS (without bitset) vs. BS2 TaskQPS baseline StdDevQPS my_version StdDev Pct diff HighAndTonsLowNot 4.29 (2.9%) 1.08 (0.6%) -74.8% ( -76% - -73%) HighAndTonsLowOr 4.87 (6.4%) 1.24 (1.0%) -74.4% ( -76% - -71%) HighAndSomeLowNot 9.03 (5.2%) 4.11 (4.1%) -54.4% ( -60% - -47%) HighAndSomeLowOr 16.21 (9.6%) 7.75 (4.1%) -52.2% ( -60% - -42%) LowAndSomeLowOr 303.28 (2.4%) 183.14 (6.6%) -39.6% ( -47% - -31%) LowAndSomeLowNot 257.24 (1.8%) 157.07 (6.5%) -38.9% ( -46% - -31%) LowAndSomeHighOr 36.78 (1.9%) 33.74 (3.0%) -8.3% ( -12% - -3%) LowAndTonsLowNot 21.28 (2.0%) 19.69 (6.9%) -7.5% ( -16% - 1%) LowAndSomeHighNot 34.40 (1.6%) 33.69 (3.2%) -2.1% ( -6% - 2%) PKLookup 100.63 (4.8%) 103.46 (4.7%) 2.8% ( -6% - 12%) LowAndTonsHighOr 1.26 (1.6%) 1.41 (1.7%) 11.8% ( 8% - 15%) LowAndTonsLowOr 13.66 (0.9%) 15.50 (6.0%) 13.5% ( 6% - 20%) HighAndSomeHighNot 2.65 (1.4%) 3.12 (6.5%) 17.6% ( 9% - 25%) HighAndSomeHighOr 2.21 (2.4%) 2.62 (5.8%) 18.6% ( 10% - 27%) HighAndTonsHighOr 0.07 (0.8%) 0.19 (10.5%) 160.3% ( 147% - 172%) LowAndTonsHighNot 2.86 (1.6%) 10.24 (18.1%) 257.7% ( 234% - 281%) HighAndTonsHighNot 0.05 (0.8%) 0.40 (28.2%) 641.8% ( 607% - 676%) BS vs. BS2 TaskQPS baseline StdDevQPS my_version StdDev Pct diff HighAndTonsLowOr 4.02 (6.8%) 0.87 (0.5%) -78.2% ( -80% - -76%) HighAndTonsLowNot 4.95 (3.4%) 1.29 (0.9%) -73.9% ( -75% - -72%) HighAndSomeLowOr 14.45 (9.5%) 6.68 (3.7%) -53.8% ( -61% - -44%) HighAndSomeLowNot 14.78 (5.1%) 7.48 (3.9%) -49.4% ( -55% - -42%) LowAndSomeLowOr 316.55 (2.2%) 170.14 (5.6%) -46.3% ( -52% - -39%) LowAndSomeLowNot 283.47 (1.7%) 157.35 (6.0%) -44.5% ( -51% - -37%) LowAndSomeHighOr 39.39 (2.0%) 35.07 (3.1%) -11.0% ( -15% - -6%) LowAndSomeHighNot 53.96 (2.0%) 48.57 (3.8%) -10.0% ( -15% - -4%) LowAndTonsLowNot 17.97 (1.5%) 17.04 (6.0%) -5.2% ( -12% - 2%) PKLookup 97.57 (2.7%) 100.21 (5.2%) 2.7% ( -5% - 10%) LowAndTonsHighOr 3.59 (1.7%) 3.74 (2.4%) 4.1% ( 0% - 8%) LowAndTonsLowOr 14.71 (1.3%) 15.63 (5.7%) 6.3% ( 0% - 13%) HighAndSomeHighNot 1.84 (1.3%) 2.05 (5.6%) 11.2% ( 4% - 18%) HighAndSomeHighOr 1.93 (2.1%) 2.16 (5.6%) 11.9% ( 4% - 20%) HighAndTonsHighOr 0.05 (1.0%) 0.13 (14.1%) 144.8% ( 128% - 161%) LowAndTonsHighNot 1.63 (1.9%) 4.95 (7.2%) 204.0% ( 191% - 217%) HighAndTonsHighNot 0.06 (1.0%) 0.34 (18.2%) 459.6% ( 435% - 483%) BNS (with bitset) vs. BS2 TaskQPS baseline StdDevQPS my_version StdDev Pct diff HighAndSomeLowOr 7.45 (12.0%) 3.49 (6.6%) -53.1% ( -64% - -39%) HighAndSomeLowNot 10.45 (8.0%) 5.25 (6.8%) -49.7% ( -59% - -37%) LowAndSomeLowOr 310.53 (2.3%) 168.56 (5.8%) -45.7% ( -52% - -38%) LowAndSomeLowNot 292.05 (2.3%) 165.88 (5.7%) -43.2% ( -50% - -36%) HighAndTonsLowNot 5.94 (3.5%) 4.33 (6.8%) -27.0% ( -36% - -17%) HighAndTonsLowOr 5.92 (4.4%) 4.39 (6.0%) -25.9% ( -34% - -16%) LowAndSomeHighNot 53.79 (2.4%) 47.71 (2.8%) -11.3% ( -16% - -6%) LowAndSomeHighOr 31.03 (2.6%) 28.20 (2.4%) -9.1% ( -13% - -4%) LowAndTonsLowOr 18.58 (1.1%) 17.60 (6.2%) -5.3% ( -12% - 2%) HighAndSomeHighNot 1.49 (1.8%) 1.44 (8.9%) -3.5% ( -13% - 7%) PKLookup 96.96 (3.4%) 100.03 (5.1%) 3.2% ( -5% - 12%) LowAndTonsHighOr 2.06 (2.2%) 2.18 (2.3%) 5.9% ( 1% - 10%) LowAndTonsLowNot 13.63 (1.3%) 14.57 (6.3%) 6.9% ( 0% - 14%) HighAndSomeHighOr 2.03 (2.4%) 2.33 (8.1%) 14.5% ( 3% - 25%) HighAndTonsHighOr 0.07 (0.8%) 0.17 (13.6%) 158.2% ( 142% - 174%) LowAndTonsHighNot 1.40 (2.2%) 6.21 (11.3%) 344.2% ( 323% - 365%) HighAndTonsHighNot 0.07 (1.1%) 0.46 (24.2%) 572.1% ( 540% - 604%) {code} was (Author: dhuang): A patch based on lucene github mirror commit cf10341825ff6bd1662dd48c51926bc51d751ce5. I use a bitset to skip required docs when scaning optional and prohibited docs. The perf. comparison is at the bottom. Besides, I build a new tasks file the test the perf. and I discover that BNS optimize the "+a -b -c -d ..." case a lot, when "b c d ..." hits many docs. <code> BNS (without bitset) vs. BS2 TaskQPS baseline StdDevQPS my_version StdDev Pct diff HighAndTonsLowNot 4.29 (2.9%) 1.08 (0.6%) -74.8% ( -76% - -73%) HighAndTonsLowOr 4.87 (6.4%) 1.24 (1.0%) -74.4% ( -76% - -71%) HighAndSomeLowNot 9.03 (5.2%) 4.11 (4.1%) -54.4% ( -60% - -47%) HighAndSomeLowOr 16.21 (9.6%) 7.75 (4.1%) -52.2% ( -60% - -42%) LowAndSomeLowOr 303.28 (2.4%) 183.14 (6.6%) -39.6% ( -47% - -31%) LowAndSomeLowNot 257.24 (1.8%) 157.07 (6.5%) -38.9% ( -46% - -31%) LowAndSomeHighOr 36.78 (1.9%) 33.74 (3.0%) -8.3% ( -12% - -3%) LowAndTonsLowNot 21.28 (2.0%) 19.69 (6.9%) -7.5% ( -16% - 1%) LowAndSomeHighNot 34.40 (1.6%) 33.69 (3.2%) -2.1% ( -6% - 2%) PKLookup 100.63 (4.8%) 103.46 (4.7%) 2.8% ( -6% - 12%) LowAndTonsHighOr 1.26 (1.6%) 1.41 (1.7%) 11.8% ( 8% - 15%) LowAndTonsLowOr 13.66 (0.9%) 15.50 (6.0%) 13.5% ( 6% - 20%) HighAndSomeHighNot 2.65 (1.4%) 3.12 (6.5%) 17.6% ( 9% - 25%) HighAndSomeHighOr 2.21 (2.4%) 2.62 (5.8%) 18.6% ( 10% - 27%) HighAndTonsHighOr 0.07 (0.8%) 0.19 (10.5%) 160.3% ( 147% - 172%) LowAndTonsHighNot 2.86 (1.6%) 10.24 (18.1%) 257.7% ( 234% - 281%) HighAndTonsHighNot 0.05 (0.8%) 0.40 (28.2%) 641.8% ( 607% - 676%) BS vs. BS2 TaskQPS baseline StdDevQPS my_version StdDev Pct diff HighAndTonsLowOr 4.02 (6.8%) 0.87 (0.5%) -78.2% ( -80% - -76%) HighAndTonsLowNot 4.95 (3.4%) 1.29 (0.9%) -73.9% ( -75% - -72%) HighAndSomeLowOr 14.45 (9.5%) 6.68 (3.7%) -53.8% ( -61% - -44%) HighAndSomeLowNot 14.78 (5.1%) 7.48 (3.9%) -49.4% ( -55% - -42%) LowAndSomeLowOr 316.55 (2.2%) 170.14 (5.6%) -46.3% ( -52% - -39%) LowAndSomeLowNot 283.47 (1.7%) 157.35 (6.0%) -44.5% ( -51% - -37%) LowAndSomeHighOr 39.39 (2.0%) 35.07 (3.1%) -11.0% ( -15% - -6%) LowAndSomeHighNot 53.96 (2.0%) 48.57 (3.8%) -10.0% ( -15% - -4%) LowAndTonsLowNot 17.97 (1.5%) 17.04 (6.0%) -5.2% ( -12% - 2%) PKLookup 97.57 (2.7%) 100.21 (5.2%) 2.7% ( -5% - 10%) LowAndTonsHighOr 3.59 (1.7%) 3.74 (2.4%) 4.1% ( 0% - 8%) LowAndTonsLowOr 14.71 (1.3%) 15.63 (5.7%) 6.3% ( 0% - 13%) HighAndSomeHighNot 1.84 (1.3%) 2.05 (5.6%) 11.2% ( 4% - 18%) HighAndSomeHighOr 1.93 (2.1%) 2.16 (5.6%) 11.9% ( 4% - 20%) HighAndTonsHighOr 0.05 (1.0%) 0.13 (14.1%) 144.8% ( 128% - 161%) LowAndTonsHighNot 1.63 (1.9%) 4.95 (7.2%) 204.0% ( 191% - 217%) HighAndTonsHighNot 0.06 (1.0%) 0.34 (18.2%) 459.6% ( 435% - 483%) BNS (with bitset) vs. BS2 TaskQPS baseline StdDevQPS my_version StdDev Pct diff HighAndSomeLowOr 7.45 (12.0%) 3.49 (6.6%) -53.1% ( -64% - -39%) HighAndSomeLowNot 10.45 (8.0%) 5.25 (6.8%) -49.7% ( -59% - -37%) LowAndSomeLowOr 310.53 (2.3%) 168.56 (5.8%) -45.7% ( -52% - -38%) LowAndSomeLowNot 292.05 (2.3%) 165.88 (5.7%) -43.2% ( -50% - -36%) HighAndTonsLowNot 5.94 (3.5%) 4.33 (6.8%) -27.0% ( -36% - -17%) HighAndTonsLowOr 5.92 (4.4%) 4.39 (6.0%) -25.9% ( -34% - -16%) LowAndSomeHighNot 53.79 (2.4%) 47.71 (2.8%) -11.3% ( -16% - -6%) LowAndSomeHighOr 31.03 (2.6%) 28.20 (2.4%) -9.1% ( -13% - -4%) LowAndTonsLowOr 18.58 (1.1%) 17.60 (6.2%) -5.3% ( -12% - 2%) HighAndSomeHighNot 1.49 (1.8%) 1.44 (8.9%) -3.5% ( -13% - 7%) PKLookup 96.96 (3.4%) 100.03 (5.1%) 3.2% ( -5% - 12%) LowAndTonsHighOr 2.06 (2.2%) 2.18 (2.3%) 5.9% ( 1% - 10%) LowAndTonsLowNot 13.63 (1.3%) 14.57 (6.3%) 6.9% ( 0% - 14%) HighAndSomeHighOr 2.03 (2.4%) 2.33 (8.1%) 14.5% ( 3% - 25%) HighAndTonsHighOr 0.07 (0.8%) 0.17 (13.6%) 158.2% ( 142% - 174%) LowAndTonsHighNot 1.40 (2.2%) 6.21 (11.3%) 344.2% ( 323% - 365%) HighAndTonsHighNot 0.07 (1.1%) 0.46 (24.2%) 572.1% ( 540% - 604%) </code> > BooleanScorer should sometimes be used for MUST clauses > ------------------------------------------------------- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, luceneutil-score-equal.patch, luceneutil-score-equal.patch > > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 1000000 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org