[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123072#comment-13123072 ]
Robert Muir commented on LUCENE-1536: ------------------------------------- Here's the results... F0.1 for example means filter accepting a random 0.1% of documents. {noformat} Task QPS trunkStdDev trunk QPS patchStdDev patch Pct diff PhraseF0.1 67.61 1.89 29.85 2.52 -60% - -50% PhraseF0.5 20.08 0.72 13.09 1.11 -42% - -26% PhraseF1.0 12.37 0.46 8.84 0.88 -37% - -18% OrHighHighF0.1 78.84 1.19 59.96 2.87 -28% - -19% TermF0.5 133.27 4.80 125.91 7.29 -14% - 3% OrHighHigh 12.73 0.45 12.13 0.92 -14% - 6% Fuzzy1 57.63 1.70 56.62 2.33 -8% - 5% Fuzzy2 96.92 2.25 96.19 2.63 -5% - 4% AndHighHighF100.0 16.99 0.50 16.92 1.38 -11% - 10% AndHighHighF99.0 17.00 0.48 16.94 1.37 -10% - 10% AndHighHighF95.0 17.00 0.48 16.98 1.35 -10% - 10% Fuzzy2F0.1 107.24 2.74 107.29 2.68 -4% - 5% AndHighHighF90.0 17.04 0.47 17.13 1.36 -9% - 11% Fuzzy1F0.1 74.60 1.58 75.03 1.55 -3% - 4% SloppyPhraseF100.0 7.82 0.16 7.89 0.24 -4% - 6% SloppyPhraseF99.0 7.82 0.16 7.92 0.23 -3% - 6% Fuzzy2F100.0 97.16 2.31 98.43 2.19 -3% - 6% PKLookup 171.71 6.83 174.15 7.28 -6% - 10% WildcardF0.1 67.96 1.06 69.08 1.95 -2% - 6% Wildcard 43.40 0.89 44.13 0.92 -2% - 5% Fuzzy2F99.0 96.83 2.46 98.49 2.21 -3% - 6% Fuzzy2F95.0 97.01 2.47 98.79 2.18 -2% - 6% SpanNearF100.0 3.11 0.04 3.18 0.09 -1% - 6% AndHighHighF75.0 17.13 0.48 17.57 1.36 -7% - 13% Fuzzy2F90.0 97.01 2.53 99.49 2.10 -2% - 7% OrHighHighF0.5 31.57 0.45 32.41 1.07 -2% - 7% SloppyPhraseF95.0 7.82 0.18 8.03 0.25 -2% - 8% SpanNearF99.0 3.11 0.04 3.20 0.09 -1% - 7% AndHighHighF0.1 136.96 3.21 140.94 5.15 -3% - 9% SloppyPhraseF0.1 56.27 0.88 57.97 1.47 -1% - 7% Fuzzy2F0.5 100.39 2.48 103.57 2.47 -1% - 8% PhraseF2.0 7.95 0.31 8.20 0.65 -8% - 15% AndHighHigh 17.97 0.46 18.55 0.84 -3% - 10% TermF0.1 351.76 9.38 363.42 16.25 -3% - 10% SloppyPhrase 7.90 0.16 8.19 0.19 0% - 8% Phrase 3.69 0.12 3.83 0.13 -3% - 10% WildcardF0.5 62.57 0.88 65.31 2.07 0% - 9% SloppyPhraseF90.0 7.83 0.16 8.18 0.24 0% - 9% Fuzzy2F75.0 96.77 2.46 101.14 2.41 0% - 9% SpanNear 3.15 0.04 3.30 0.07 1% - 8% Term 71.54 4.98 74.98 5.61 -9% - 21% SpanNearF95.0 3.11 0.05 3.26 0.09 0% - 9% PhraseF100.0 3.49 0.13 3.68 0.15 -2% - 14% PhraseF99.0 3.49 0.12 3.69 0.15 -2% - 14% SpanNearF0.1 31.54 0.48 33.49 0.73 2% - 10% PhraseF95.0 3.49 0.12 3.72 0.16 -1% - 15% SpanNearF90.0 3.12 0.04 3.35 0.09 3% - 11% Fuzzy2F50.0 97.08 2.32 104.79 2.66 2% - 13% PhraseF90.0 3.49 0.13 3.78 0.16 0% - 17% Fuzzy1F100.0 47.68 1.41 52.27 1.08 4% - 15% Fuzzy1F99.0 47.57 1.49 52.28 1.19 4% - 16% AndHighHighF50.0 17.30 0.48 19.12 1.47 0% - 22% WildcardF1.0 58.03 0.81 64.32 2.40 5% - 16% Fuzzy1F95.0 47.59 1.50 52.84 1.17 5% - 17% SloppyPhraseF75.0 7.85 0.15 8.73 0.24 6% - 16% Fuzzy2F1.0 98.59 2.36 110.12 2.89 6% - 17% Fuzzy1F90.0 47.51 1.40 53.54 1.09 7% - 18% PhraseF75.0 3.51 0.13 3.98 0.18 4% - 22% TermF1.0 92.28 3.05 104.56 7.44 1% - 25% WildcardF99.0 36.01 0.76 40.88 1.16 8% - 19% Fuzzy1F0.5 59.00 1.10 67.10 1.36 9% - 18% WildcardF100.0 35.92 0.79 40.86 1.19 8% - 19% WildcardF95.0 36.01 0.75 41.02 1.19 8% - 19% WildcardF90.0 36.06 0.70 41.14 1.20 8% - 19% Fuzzy2F20.0 98.32 2.34 112.69 2.91 9% - 20% WildcardF75.0 36.19 0.62 41.69 1.15 10% - 20% AndHighHighF0.5 49.93 1.37 57.85 4.13 4% - 27% Fuzzy1F75.0 47.25 1.50 55.55 1.11 11% - 23% Fuzzy2F10.0 98.47 2.46 116.18 3.00 12% - 24% WildcardF50.0 36.77 0.55 43.44 1.29 12% - 23% OrHighHighF1.0 24.37 0.38 28.99 1.90 9% - 28% Fuzzy1F2.0 52.64 1.05 63.12 1.32 15% - 24% SpanNearF75.0 3.11 0.04 3.74 0.10 15% - 24% Fuzzy2F5.0 97.96 2.31 118.02 3.48 14% - 27% Fuzzy2F2.0 98.02 2.22 119.13 3.42 15% - 27% OrHighHighF100.0 7.70 0.34 9.51 0.34 13% - 33% OrHighHighF99.0 7.70 0.36 9.56 0.34 14% - 34% Fuzzy1F50.0 47.46 1.24 59.15 1.18 19% - 30% PhraseF50.0 3.57 0.12 4.45 0.23 14% - 35% OrHighHighF95.0 7.73 0.35 9.73 0.35 16% - 36% SloppyPhraseF50.0 7.92 0.16 10.09 0.28 21% - 33% WildcardF2.0 53.32 0.69 68.29 3.44 20% - 36% OrHighHighF90.0 7.77 0.35 9.97 0.35 18% - 39% WildcardF20.0 41.13 0.60 54.63 2.12 25% - 39% OrHighHighF75.0 7.91 0.32 10.73 0.36 26% - 45% WildcardF5.0 47.44 0.57 65.42 3.11 29% - 46% WildcardF10.0 44.01 0.53 61.16 2.61 31% - 46% Fuzzy1F20.0 49.57 1.20 69.49 1.70 33% - 47% Fuzzy1F1.0 54.39 1.07 76.95 2.03 35% - 48% AndHighHighF1.0 34.63 1.07 50.01 4.02 28% - 60% PhraseF5.0 5.16 0.20 7.61 0.75 27% - 68% Fuzzy1F10.0 50.23 1.07 75.36 2.11 42% - 57% OrHighHighF50.0 8.36 0.29 12.58 0.48 39% - 61% OrHighHighF2.0 19.65 0.34 29.58 2.27 36% - 65% SpanNearF50.0 3.11 0.04 4.76 0.12 47% - 58% TermF2.0 68.99 2.38 106.22 8.65 36% - 72% Fuzzy1F5.0 50.74 1.06 79.90 2.38 49% - 65% PhraseF20.0 3.81 0.13 6.10 0.45 43% - 78% TermF50.0 42.19 1.41 67.96 4.63 45% - 77% TermF75.0 41.36 1.46 67.47 5.30 45% - 82% TermF90.0 41.05 1.47 68.08 5.85 46% - 86% TermF95.0 41.03 1.49 68.08 6.14 45% - 87% PhraseF10.0 4.22 0.16 7.02 0.62 46% - 87% TermF99.0 40.99 1.56 68.31 6.21 45% - 89% TermF100.0 40.88 1.61 68.28 6.32 45% - 89% SloppyPhraseF0.5 18.81 0.30 31.53 0.96 59% - 75% AndHighHighF20.0 17.62 0.52 30.63 2.79 53% - 95% OrHighHighF5.0 14.99 0.29 27.44 1.98 66% - 100% SpanNearF0.5 9.17 0.12 17.12 0.42 79% - 93% TermF20.0 45.25 1.50 84.63 6.04 68% - 107% OrHighHighF20.0 10.35 0.25 19.60 1.08 74% - 104% TermF5.0 52.49 1.71 99.90 8.02 69% - 112% AndHighHighF2.0 25.97 0.81 50.45 4.72 70% - 119% OrHighHighF10.0 12.36 0.22 24.25 1.56 80% - 112% TermF10.0 46.97 1.47 92.60 7.08 76% - 119% SloppyPhraseF20.0 8.18 0.16 16.35 0.58 89% - 111% SpanNearF1.0 6.05 0.09 12.21 0.28 94% - 109% AndHighHighF10.0 18.44 0.55 40.77 4.15 92% - 151% AndHighHighF5.0 20.34 0.63 50.83 5.67 115% - 186% SloppyPhraseF10.0 8.52 0.17 22.79 0.96 151% - 184% SpanNearF20.0 3.15 0.05 9.03 0.24 174% - 198% SloppyPhraseF1.0 13.62 0.23 42.77 2.29 192% - 236% SpanNearF2.0 4.45 0.06 14.31 0.37 209% - 234% SloppyPhraseF5.0 9.12 0.17 29.98 1.41 207% - 250% SloppyPhraseF2.0 10.85 0.19 38.31 2.00 229% - 278% SpanNearF10.0 3.25 0.05 13.71 0.39 303% - 339% SpanNearF5.0 3.52 0.05 19.51 0.67 428% - 481% {noformat} > if a filter can support random access API, we should use it > ----------------------------------------------------------- > > Key: LUCENE-1536 > URL: https://issues.apache.org/jira/browse/LUCENE-1536 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search > Affects Versions: 2.4 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, > LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, > LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, > LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch > > > I ran some performance tests, comparing applying a filter via > random-access API instead of current trunk's iterator API. > This was inspired by LUCENE-1476, where we realized deletions should > really be implemented just like a filter, but then in testing found > that switching deletions to iterator was a very sizable performance > hit. > Some notes on the test: > * Index is first 2M docs of Wikipedia. Test machine is Mac OS X > 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. > * I test across multiple queries. 1-X means an OR query, eg 1-4 > means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 > AND 3 AND 4. "u s" means "united states" (phrase search). > * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, > 95, 98, 99, 99.99999 (filter is non-null but all bits are set), > 100 (filter=null, control)). > * Method high means I use random-access filter API in > IndexSearcher's main loop. Method low means I use random-access > filter API down in SegmentTermDocs (just like deleted docs > today). > * Baseline (QPS) is current trunk, where filter is applied as iterator up > "high" (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org