[ https://issues.apache.org/jira/browse/LUCENE-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638971#comment-13638971 ]
Han Jiang commented on LUCENE-2962: ----------------------------------- Oh, sorry I didn't made it clear: All the tests above were already done on wikimediumfull, which is using WIKI_MEDIUM_TASKS_10MDOCS_FILE. The crazyMinShouldMatch benefits much from skipper (as is expected from the crazy avg_len :) ), and the result is below: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff 10Terms8High10MSM 322.25 (2.5%) 97.87 (0.9%) -69.6% ( -71% - -67%) 10Terms4High10MSM 449.00 (2.1%) 194.73 (1.2%) -56.6% ( -58% - -54%) 10Terms6High10MSM 611.10 (2.6%) 327.45 (1.4%) -46.4% ( -49% - -43%) 10Terms2High10MSM 614.20 (2.6%) 472.07 (1.9%) -23.1% ( -26% - -19%) 10Terms6High8MSM 61.24 (5.9%) 56.10 (5.6%) -8.4% ( -18% - 3%) 10Terms4High6MSM 104.63 (4.9%) 100.22 (5.0%) -4.2% ( -13% - 5%) 10Terms4High2MSM 6.31 (7.8%) 6.12 (8.7%) -3.0% ( -18% - 14%) 10Terms6High4MSM 1.75 (6.6%) 1.70 (7.3%) -2.9% ( -15% - 11%) 10Terms2High4MSM 31.74 (6.5%) 30.85 (7.4%) -2.8% ( -15% - 11%) 10Terms2High2MSM 5.30 (7.0%) 5.16 (8.0%) -2.6% ( -16% - 13%) 10Terms8High4MSM 0.87 (5.8%) 0.85 (6.3%) -2.4% ( -13% - 10%) 10Terms0High8MSM 216.98 (4.1%) 211.76 (4.9%) -2.4% ( -10% - 6%) 10Terms6High2MSM 0.92 (5.3%) 0.90 (6.0%) -2.3% ( -12% - 9%) 10Terms2High8MSM 115.45 (4.8%) 113.28 (5.1%) -1.9% ( -11% - 8%) 10Terms4High8MSM 209.93 (4.4%) 206.04 (4.8%) -1.9% ( -10% - 7%) 10Terms8High8MSM 11.03 (6.8%) 10.85 (8.1%) -1.7% ( -15% - 14%) 10Terms6High6MSM 9.30 (6.8%) 9.15 (8.0%) -1.7% ( -15% - 14%) 10Terms0High2MSM 27.76 (6.9%) 27.30 (8.4%) -1.6% ( -15% - 14%) 10Terms4High3MSM 4.34 (7.0%) 4.27 (8.2%) -1.6% ( -15% - 14%) 10Terms8High6MSM 3.06 (7.1%) 3.01 (8.3%) -1.5% ( -15% - 14%) 10Terms8High2MSM 2.33 (6.5%) 2.30 (7.5%) -1.2% ( -14% - 13%) 10Terms4High4MSM 8.77 (6.6%) 8.67 (8.1%) -1.2% ( -14% - 14%) 10Terms0High6MSM 77.21 (5.7%) 76.71 (5.9%) -0.7% ( -11% - 11%) 10Terms2High6MSM 73.82 (5.7%) 73.40 (6.1%) -0.6% ( -11% - 11%) 10Terms0High4MSM 63.80 (5.9%) 63.64 (6.3%) -0.2% ( -11% - 12%) 10Terms0High10MSM 595.12 (2.4%) 595.54 (2.4%) 0.1% ( -4% - 5%) PKLookup 244.34 (3.1%) 259.97 (3.0%) 6.4% ( 0% - 12%) {noformat} > Skip data should be inlined into the postings lists > --------------------------------------------------- > > Key: LUCENE-2962 > URL: https://issues.apache.org/jira/browse/LUCENE-2962 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: Michael McCandless > Labels: gsoc2013 > Attachments: proposal.txt > > > Today, we store all skip data as a separate blob at the end of a given term's > postings (if that term occurs in enough docs to warrant skip data). > But this adds overhead during decoding -- we have to seek to a different > place for the initial load, we have to init separate readers, we have to seek > again while using the lower levels of the skip data, etc. Also, we have to > fully decode all skip information even if we are not going to use it (eg if I > only want docIDs, I still must decode position offset and lastPayloadLength). > If instead we interleaved skip data into the postings file, we could keep it > local, and "private" to each file that needs skipping. This should make it > least costly to init and then use the skip data, which'd be a good perf gain > for eg PhraseQuery, AndQuery. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org