[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396987#comment-13396987 ]
Han Jiang commented on LUCENE-3892: ----------------------------------- Oh, thank you Mike! I haven't thought too much about those skipping policies. bq. Up above, in ForFactory, when we readInt() to get numBytes ... it seems like we could stuff the header numBits into that same int and save checking that in FORUtil.decompress.... Ah, yes, I just forgot to remove the redundant codes. Here is a initial try to remove header and call ForDecompressImpl directly in readBlock():with For, blockSize=128. Data in bracket show prior benchmark. {noformat} Task QPS Base StdDev Base QPS For StdDev For Pct diff Phrase 4.99 0.37 3.57 0.26 -38% - -17% (-44% - -18%) AndHighMed 28.91 2.17 22.66 0.82 -29% - -12% (-38% - -9%) SpanNear 2.72 0.14 2.22 0.13 -26% - -8% (-36% - -8%) SloppyPhrase 4.24 0.26 3.70 0.16 -21% - -3% (-33% - -6%) Respell 40.71 2.59 37.66 1.36 -16% - 2% (-18% - 0%) Fuzzy1 43.22 2.01 40.66 0.32 -10% - 0% (-12% - 0%) Fuzzy2 16.25 0.90 15.64 0.26 -10% - 3% (-12% - 3%) Wildcard 19.07 0.86 19.07 0.73 -8% - 8% (-21% - 3%) AndHighHigh 7.76 0.47 7.77 0.15 -7% - 8% (-21% - 10%) PKLookup 87.50 4.56 88.51 1.24 -5% - 8% ( -2% - 5%) TermBGroup1M 20.42 0.87 21.32 0.74 -3% - 12% ( 2% - 10%) OrHighMed 5.33 0.68 5.61 0.14 -9% - 23% (-16% - 25%) OrHighHigh 4.43 0.53 4.69 0.12 -8% - 23% (-15% - 24%) TermGroup1M 13.30 0.34 14.31 0.40 2% - 13% ( 0% - 13%) TermBGroup1M1P 20.92 0.59 23.71 0.86 6% - 20% ( -1% - 22%) Prefix3 30.30 1.41 35.14 1.76 5% - 27% (-14% - 21%) IntNRQ 3.90 0.54 4.58 0.47 -7% - 50% (-25% - 33%) Term 42.17 1.55 52.33 2.57 13% - 35% ( 1% - 33%) {noformat} The improvement is quite general. However, I still suppose this just benefits from less method calling. I'm trying to change the PFor codes, and remove those nested call. bq. Get more direct access to the file as an int[]; ... Ok, this will be considered when the pfor+pulsing is completed. I'm just curious why we don't have readInts in ora.util yet... bq. Skipping: can we partially decode a block? ... The pfor-opt approach(encode lower bits of exception in normal area, and other bits in exception area) natually fits "partially decode a block", that'll be possible when we optimize skipping queries. > Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, > Simple9/16/64, etc.) > ------------------------------------------------------------------------------------- > > Key: LUCENE-3892 > URL: https://issues.apache.org/jira/browse/LUCENE-3892 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Labels: gsoc2012, lucene-gsoc-12 > Fix For: 4.1 > > Attachments: LUCENE-3892_for.patch, LUCENE-3892_pfor.patch, > LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, LUCENE-3892_settings.patch, > LUCENE-3892_settings.patch > > > On the flex branch we explored a number of possible intblock > encodings, but for whatever reason never brought them to completion. > There are still a number of issues opened with patches in different > states. > Initial results (based on prototype) were excellent (see > http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html > ). > I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org