[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-3892: --------------------------------------- Attachment: LUCENE-3892-direct-IntBuffer.patch The For index is 5.2 GB vs 4.9 GB for vInt: not bad to have only 5% increase in index size when using For PF (10M wikipedia index). {quote} Get more direct access to the file as an int[]; eg MMapDir could expose an IntBuffer from its ByteBuffer (saving the initial copy into byte[] that we now do). {quote} I tested this, by making hacked up changes to Billy's For patch requiring MMapDirectory and pulling an IntBuffer directly from its ByteBuffer, saving one copy of bytes into the byte[] first. But, curiously, it didn't seem to improve things much: {noformat} Task QPS base StdDev base QPS for StdDev for Pct diff AndHighMed 24.32 0.60 14.24 0.41 -44% - -38% PKLookup 131.98 3.09 108.35 1.47 -20% - -14% AndHighHigh 5.36 0.18 4.66 0.02 -16% - -9% Phrase 1.48 0.02 1.33 0.10 -18% - -2% SloppyPhrase 1.40 0.04 1.26 0.03 -13% - -5% SpanNear 1.14 0.01 1.04 0.02 -10% - -6% IntNRQ 12.13 0.70 11.27 0.46 -15% - 2% Prefix3 34.51 1.17 34.11 1.28 -8% - 6% Fuzzy1 90.63 1.74 89.68 1.46 -4% - 2% Respell 77.22 2.62 76.99 1.62 -5% - 5% Wildcard 11.84 0.40 12.20 0.37 -3% - 9% Fuzzy2 34.34 0.82 36.16 1.08 0% - 11% TermBGroup1M1P 4.71 0.11 5.02 0.18 0% - 12% OrHighMed 7.87 0.28 8.50 0.55 -2% - 19% TermBGroup1M 3.47 0.03 3.78 0.03 7% - 11% TermGroup1M 2.96 0.01 3.25 0.03 8% - 11% OrHighHigh 3.55 0.12 3.91 0.21 0% - 20% Term 9.72 0.28 10.87 0.44 4% - 19% {noformat} Maybe, instead, reading into an int[] and decoding from an int array (hopefully avoiding bounds checks) will be faster than calling IntBuffer.get for each encoded int... > Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, > Simple9/16/64, etc.) > ------------------------------------------------------------------------------------- > > Key: LUCENE-3892 > URL: https://issues.apache.org/jira/browse/LUCENE-3892 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Labels: gsoc2012, lucene-gsoc-12 > Fix For: 4.1 > > Attachments: LUCENE-3892-direct-IntBuffer.patch, > LUCENE-3892_for.patch, LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, > LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, LUCENE-3892_settings.patch > > > On the flex branch we explored a number of possible intblock > encodings, but for whatever reason never brought them to completion. > There are still a number of issues opened with patches in different > states. > Initial results (based on prototype) were excellent (see > http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html > ). > I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org