[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-3892: --------------------------------------- Attachment: LUCENE-3892-bulkVInt.patch I tested BulkVInt again, ie to decouple the cutover from Sep to BlockPF vs the vInt/FOR change. Base=Lucene40, comp=BlockPF(BulkVInt): {noformat} Task QPS base StdDev baseQPS bulkVIntStdDev bulkVInt Pct diff AndHighLow 857.35 20.10 614.20 10.73 -31% - -25% Respell 62.99 2.35 60.53 1.34 -9% - 2% AndHighMed 65.64 2.24 63.61 0.93 -7% - 1% Fuzzy2 62.83 1.75 61.72 1.31 -6% - 3% PKLookup 195.97 1.87 194.73 5.00 -4% - 2% IntNRQ 12.50 0.10 12.43 1.49 -13% - 12% Fuzzy1 72.68 1.12 73.84 0.88 -1% - 4% HighPhrase 1.75 0.05 1.78 0.08 -5% - 8% LowSpanNear 9.01 0.12 9.27 0.13 0% - 5% LowPhrase 19.73 0.43 20.64 0.15 1% - 7% MedSpanNear 4.52 0.06 4.74 0.01 3% - 6% MedPhrase 11.74 0.31 12.40 0.09 2% - 9% LowTerm 435.96 13.41 467.22 9.10 1% - 12% Prefix3 75.47 0.51 81.52 4.38 1% - 14% Wildcard 48.66 0.44 52.79 2.79 1% - 15% OrHighHigh 10.11 0.63 11.06 0.32 0% - 20% OrHighMed 20.85 1.31 22.99 0.63 0% - 20% HighSpanNear 1.50 0.02 1.67 0.01 8% - 13% OrHighLow 23.55 1.46 26.51 0.76 2% - 23% LowSloppyPhrase 6.45 0.14 7.37 0.18 9% - 19% MedTerm 163.46 10.30 188.55 5.22 5% - 26% MedSloppyPhrase 5.74 0.12 6.65 0.15 10% - 20% HighSloppyPhrase 1.69 0.04 1.98 0.11 8% - 26% AndHighHigh 19.00 0.53 22.91 0.24 16% - 25% HighTerm 28.28 1.95 34.48 0.99 10% - 34% {noformat} Base=BlockPF(BulkVInt), comp=BlockPF(FOR): {noformat} Task QPS base StdDev base QPS for StdDev for Pct diff IntNRQ 12.10 1.70 11.61 0.02 -16% - 11% HighSloppyPhrase 2.00 0.11 1.95 0.03 -8% - 4% HighPhrase 1.85 0.05 1.81 0.07 -8% - 4% Wildcard 52.32 3.09 52.49 0.24 -5% - 7% LowSloppyPhrase 7.41 0.24 7.43 0.19 -5% - 6% MedSloppyPhrase 6.69 0.18 6.72 0.21 -5% - 6% OrHighMed 22.99 0.55 23.23 0.85 -4% - 7% Respell 61.99 2.01 62.70 1.57 -4% - 7% OrHighLow 26.52 0.69 26.83 1.00 -5% - 7% Fuzzy1 74.72 1.34 75.59 1.43 -2% - 4% PKLookup 189.68 7.14 192.09 3.82 -4% - 7% OrHighHigh 11.05 0.27 11.21 0.42 -4% - 7% Fuzzy2 62.78 1.86 63.70 1.87 -4% - 7% HighSpanNear 1.65 0.03 1.69 0.02 0% - 5% Prefix3 80.25 5.44 82.57 1.03 -4% - 11% AndHighHigh 22.79 0.11 23.53 0.13 2% - 4% LowSpanNear 9.16 0.26 9.48 0.21 -1% - 8% MedSpanNear 4.67 0.09 4.84 0.07 0% - 7% MedPhrase 12.59 0.26 13.07 0.24 0% - 7% LowPhrase 20.86 0.33 22.06 0.30 2% - 8% AndHighLow 618.27 13.15 655.52 3.30 3% - 8% HighTerm 33.95 1.11 36.02 0.08 2% - 9% MedTerm 186.09 5.51 198.46 0.09 3% - 9% AndHighMed 63.71 1.15 69.15 0.45 5% - 11% LowTerm 469.17 7.25 514.55 2.83 7% - 12% {noformat} So ... most of the gains come from BlockPF cutover. This is sort of ... surprising/disappointing, ie, our bottlenecks are the abstraction layers, not the actual decode cost. Still it's good to make progress on removing the abstractions. Also, it looks like the only query that is slower than Lucene40 is AndHighLow ... however, it's also an extremely fast query to begin with so I think it's a fine tradeoff that it gets slower while the hard/slower queries get faster. > Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, > Simple9/16/64, etc.) > ------------------------------------------------------------------------------------- > > Key: LUCENE-3892 > URL: https://issues.apache.org/jira/browse/LUCENE-3892 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Labels: gsoc2012, lucene-gsoc-12 > Fix For: 4.1 > > Attachments: LUCENE-3892-BlockTermScorer.patch, > LUCENE-3892-blockFor&hardcode(base).patch, > LUCENE-3892-blockFor&packedecoder(comp).patch, > LUCENE-3892-blockFor-with-packedints-decoder.patch, > LUCENE-3892-blockFor-with-packedints-decoder.patch, > LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch, > LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, > LUCENE-3892-handle_open_files.patch, > LUCENE-3892-pfor-compress-iterate-numbits.patch, > LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, > LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, > LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, > LUCENE-3892_settings.patch, LUCENE-3892_settings.patch > > > On the flex branch we explored a number of possible intblock > encodings, but for whatever reason never brought them to completion. > There are still a number of issues opened with patches in different > states. > Initial results (based on prototype) were excellent (see > http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html > ). > I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org