[ https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Da Huang updated LUCENE-4396: ----------------------------- Attachment: tasks.cpp LUCENE-4396.patch And.tasks The patch based on git mirror commit 67d17eb81b754fa242bb91e1b91070fd8b38ecd9 . In this patch, I remove those unused classes, encapsulate some functions and fix some bugs. Besides, the tasks file used before has heavy relevance between cases. I think it's not good. Therefore, I generate a new tasks file. The file And.tasks is the new tasks file, while 'tasks.cpp' is the program to generate this tasks file. You can generate tasks file by running {code} g++ tasks.cpp -std=c++0x -o tasks ./tasks < wikimedium.10M.nostopwords.tasks > And.tasks {code} The perf. on the new tasks file is as follows. {code} TaskQPS baseline StdDevQPS my_version StdDev Pct diff HighAnd5LowNot 5.40 (5.1%) 4.88 (4.2%) -9.6% ( -18% - 0%) HighAnd5LowOr 7.05 (10.2%) 6.87 (3.8%) -2.6% ( -15% - 12%) LowAnd5LowNot 27.17 (2.1%) 26.47 (2.6%) -2.6% ( -7% - 2%) HighAnd5HighOr 1.13 (3.8%) 1.11 (2.2%) -1.8% ( -7% - 4%) LowAnd5LowOr 31.82 (2.6%) 31.35 (2.3%) -1.5% ( -6% - 3%) PKLookup 98.80 (5.2%) 102.02 (6.3%) 3.3% ( -7% - 15%) HighAnd5HighNot 1.95 (1.0%) 2.04 (2.1%) 4.7% ( 1% - 7%) LowAnd5HighNot 9.46 (2.9%) 10.32 (2.7%) 9.0% ( 3% - 15%) LowAnd5HighOr 7.56 (2.8%) 8.42 (2.8%) 11.4% ( 5% - 17%) LowAnd60HighOr 0.51 (2.5%) 0.82 (4.8%) 58.7% ( 50% - 67%) LowAnd60LowNot 2.61 (1.0%) 4.64 (3.4%) 78.0% ( 72% - 83%) HighAnd60LowNot 1.30 (1.2%) 2.36 (3.7%) 81.1% ( 75% - 87%) HighAnd60LowOr 1.18 (1.3%) 2.15 (3.7%) 82.0% ( 76% - 88%) LowAnd60LowOr 2.25 (0.6%) 4.61 (4.2%) 104.7% ( 99% - 110%) HighAnd60HighOr 0.10 (0.7%) 0.26 (4.8%) 151.2% ( 144% - 157%) LowAnd60HighNot 0.53 (2.5%) 1.62 (8.0%) 204.0% ( 188% - 220%) HighAnd60HighNot 0.14 (0.9%) 0.59 (8.9%) 328.4% ( 315% - 341%) {code} My next step is to do more tests to get better rules and make sure the correctness. I think it can be finished by this Friday. As the suggested pencil down date is comming, I will begin to scrub the code, improve the comments, and write document in conclusion. > BooleanScorer should sometimes be used for MUST clauses > ------------------------------------------------------- > > Key: LUCENE-4396 > URL: https://issues.apache.org/jira/browse/LUCENE-4396 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, > LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch, > luceneutil-score-equal.patch, stat.cpp, stat.cpp, tasks.cpp > > > Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT. > If there is one or more MUST clauses we always use BooleanScorer2. > But I suspect that unless the MUST clauses have very low hit count compared > to the other clauses, that BooleanScorer would perform better than > BooleanScorer2. BooleanScorer still has some vestiges from when it used to > handle MUST so it shouldn't be hard to bring back this capability ... I think > the challenging part might be the heuristics on when to use which (likely we > would have to use firstDocID as proxy for total hit count). > Likely we should also have BooleanScorer sometimes use .advance() on the subs > in this case, eg if suddenly the MUST clause skips 1000000 docs then you want > to .advance() all the SHOULD clauses. > I won't have near term time to work on this so feel free to take it if you > are inspired! -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org