[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-3892: --------------------------------------- Attachment: LUCENE-3892-BlockTermScorer.patch I was curious how much the "layers" (SepPostingsReader, FixedIntBlock.IntIndexInput, ForFactor) between the FOR block decode and the query scoring were hurting performance, so I wrote a specialized scorer (BlockTermScorer) for just TermQuery. The scorer is only used if the postings format is ForPF, and if no skipping will be done (I didn't implement advance...). The scorer reaches down and holds on to the decoded int[] buffer, and then does its own adding up of the doc deltas, reading the next block, etc. The baseline is the current branch (not trunk!): {noformat} Task QPS base StdDev base QPS patch StdDev patch Pct diff Wildcard 10.31 0.40 10.10 0.17 -7% - 3% AndHighHigh 4.90 0.10 4.82 0.15 -6% - 3% Prefix3 28.50 1.06 28.11 0.50 -6% - 4% IntNRQ 9.72 0.46 9.60 0.57 -11% - 9% SloppyPhrase 0.92 0.03 0.92 0.02 -6% - 5% PKLookup 106.21 2.54 105.66 2.07 -4% - 3% Phrase 1.56 0.00 1.56 0.01 -1% - 0% Fuzzy1 90.33 3.48 90.19 2.25 -6% - 6% Fuzzy2 29.66 0.61 29.64 0.85 -4% - 4% AndHighMed 14.87 0.29 15.02 0.81 -6% - 8% Respell 78.83 2.46 79.62 1.54 -3% - 6% SpanNear 1.18 0.02 1.19 0.04 -4% - 6% TermGroup1M 2.78 0.06 3.28 0.14 10% - 25% OrHighHigh 4.19 0.24 5.04 0.20 9% - 32% OrHighMed 8.21 0.45 9.87 0.23 11% - 30% TermBGroup1M1P 5.11 0.20 6.21 0.26 12% - 31% TermBGroup1M 4.49 0.11 5.49 0.27 13% - 31% Term 8.89 0.58 11.90 1.52 9% - 61% {noformat} Seems like we get a good boost removing the abstractions. > Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, > Simple9/16/64, etc.) > ------------------------------------------------------------------------------------- > > Key: LUCENE-3892 > URL: https://issues.apache.org/jira/browse/LUCENE-3892 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Labels: gsoc2012, lucene-gsoc-12 > Fix For: 4.1 > > Attachments: LUCENE-3892-BlockTermScorer.patch, > LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892_for.patch, > LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, > LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, > LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, > LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_settings.patch, > LUCENE-3892_settings.patch > > > On the flex branch we explored a number of possible intblock > encodings, but for whatever reason never brought them to completion. > There are still a number of issues opened with patches in different > states. > Initial results (based on prototype) were excellent (see > http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html > ). > I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org