[ https://issues.apache.org/jira/browse/LUCENE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adrien Grand updated LUCENE-6458: --------------------------------- Attachment: LUCENE-6458.patch wikimedium.10M.nostopwords.tasks I did some more benchmarking of the change with filters (see attached tasks file) and various thresholds (and a fixed seed): {noformat} 16 TaskQPS baseline StdDev QPS patch StdDev Pct diff MTQ 24.33 (7.5%) 20.67 (7.3%) -15.1% ( -27% - 0%) IntNRQ 20.38 (7.3%) 17.85 (11.9%) -12.4% ( -29% - 7%) IntNRQ_50 8.94 (10.1%) 8.67 (8.6%) -3.0% ( -19% - 17%) MTQ_50 9.05 (7.9%) 8.93 (5.3%) -1.3% ( -13% - 12%) IntNRQ_10 13.72 (12.7%) 13.60 (11.9%) -0.9% ( -22% - 27%) IntNRQ_1 17.53 (17.1%) 17.53 (16.3%) 0.0% ( -28% - 40%) MTQ_10 13.70 (11.2%) 13.89 (8.7%) 1.4% ( -16% - 23%) MTQ_1 19.11 (15.8%) 21.43 (18.0%) 12.1% ( -18% - 54%) 64 TaskQPS baseline StdDev QPS patch StdDev Pct diff IntNRQ 20.53 (6.9%) 16.42 (5.3%) -20.0% ( -30% - -8%) MTQ 24.31 (7.3%) 20.34 (6.4%) -16.3% ( -27% - -2%) IntNRQ_50 8.87 (9.2%) 8.31 (6.5%) -6.3% ( -20% - 10%) IntNRQ_10 13.55 (12.7%) 12.80 (10.2%) -5.6% ( -25% - 19%) IntNRQ_1 17.27 (16.3%) 16.38 (13.1%) -5.2% ( -29% - 28%) MTQ_50 9.00 (7.6%) 9.02 (4.5%) 0.3% ( -10% - 13%) MTQ_10 13.65 (11.1%) 14.73 (8.2%) 7.9% ( -10% - 30%) MTQ_1 18.95 (15.1%) 25.32 (17.2%) 33.6% ( 1% - 77%) 256 TaskQPS baseline StdDev QPS patch StdDev Pct diff IntNRQ 20.43 (9.4%) 12.69 (1.7%) -37.9% ( -44% - -29%) MTQ 24.13 (9.3%) 19.32 (5.3%) -19.9% ( -31% - -5%) IntNRQ_1 17.21 (19.5%) 13.90 (7.7%) -19.2% ( -38% - 9%) IntNRQ_10 13.49 (12.7%) 10.95 (5.7%) -18.8% ( -33% - 0%) IntNRQ_50 8.85 (10.5%) 7.40 (3.8%) -16.4% ( -27% - -2%) MTQ_50 8.94 (8.3%) 8.82 (4.4%) -1.3% ( -12% - 12%) MTQ_10 13.53 (12.6%) 14.64 (5.9%) 8.2% ( -9% - 30%) MTQ_1 18.88 (15.6%) 26.52 (14.2%) 40.5% ( 9% - 83%) 1024 TaskQPS baseline StdDev QPS patch StdDev Pct diff IntNRQ 20.40 (7.7%) 6.54 (1.5%) -67.9% ( -71% - -63%) IntNRQ_1 17.57 (17.2%) 8.27 (2.9%) -52.9% ( -62% - -39%) IntNRQ_10 13.66 (13.0%) 6.72 (2.4%) -50.8% ( -58% - -40%) IntNRQ_50 8.96 (10.4%) 5.01 (1.5%) -44.1% ( -50% - -35%) MTQ 24.41 (8.2%) 18.07 (4.4%) -26.0% ( -35% - -14%) MTQ_50 9.05 (8.1%) 8.65 (3.5%) -4.5% ( -14% - 7%) MTQ_10 13.60 (11.5%) 14.41 (3.9%) 6.0% ( -8% - 24%) MTQ_1 19.11 (15.6%) 27.32 (12.9%) 43.0% ( 12% - 84%) {noformat} Rewriting to a BooleanQuery never helps when there is no filter, but something that the benchmark doesn't capture is that at least BooleanQuery does not allocate O(maxDoc) memory which can matter for large datasets. When there are filters, it's more complicated, it depends on the density of the filter, on the number of terms and also apparently on how frequencies of the different terms compare (this is my current theory for why WildcardQuery performs better than NRQ). Net/net I think this validates that 64 would be a good threshold to rewrite, with a minimum slowdown when filters are dense, and interesting speedups when filters are sparse? > MultiTermQuery's FILTER rewrite method should support skipping whenever > possible > -------------------------------------------------------------------------------- > > Key: LUCENE-6458 > URL: https://issues.apache.org/jira/browse/LUCENE-6458 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Attachments: LUCENE-6458.patch, LUCENE-6458.patch, > wikimedium.10M.nostopwords.tasks > > > Today MultiTermQuery's FILTER rewrite always builds a bit set fom all > matching terms. This means that we need to consume the entire postings lists > of all matching terms. Instead we should try to execute like regular > disjunctions when there are few terms. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org