[ https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576094#comment-13576094 ]
Michael McCandless commented on LUCENE-4764: -------------------------------------------- I re-tested trunk vs this new DV format, with all 9 dims on the full 6.6M wikibig index. (The added 2 dims, username and categories, have many many unique values): {noformat} Task QPS base StdDev QPS comp StdDev Pct diff HighPhrase 13.68 (8.1%) 13.64 (8.4%) -0.3% ( -15% - 17%) LowPhrase 15.05 (4.4%) 15.08 (4.4%) 0.1% ( -8% - 9%) LowSpanNear 7.12 (2.5%) 7.17 (2.3%) 0.6% ( -4% - 5%) AndHighLow 64.03 (1.3%) 64.55 (1.3%) 0.8% ( -1% - 3%) HighSloppyPhrase 0.82 (5.7%) 0.83 (4.8%) 1.1% ( -8% - 12%) Respell 44.90 (4.0%) 45.43 (4.3%) 1.2% ( -6% - 9%) LowSloppyPhrase 15.37 (2.1%) 15.57 (1.8%) 1.3% ( -2% - 5%) HighSpanNear 2.91 (1.8%) 2.95 (1.9%) 1.3% ( -2% - 5%) Fuzzy2 28.55 (2.0%) 29.02 (2.1%) 1.7% ( -2% - 5%) MedSloppyPhrase 16.56 (1.2%) 16.94 (1.2%) 2.3% ( 0% - 4%) AndHighMed 39.47 (0.8%) 40.40 (1.0%) 2.4% ( 0% - 4%) Fuzzy1 24.08 (1.3%) 24.73 (1.4%) 2.7% ( 0% - 5%) MedSpanNear 17.70 (1.6%) 18.19 (1.6%) 2.8% ( 0% - 6%) MedPhrase 41.06 (2.2%) 42.46 (2.6%) 3.4% ( -1% - 8%) LowTerm 34.19 (0.9%) 35.69 (1.0%) 4.4% ( 2% - 6%) AndHighHigh 11.92 (1.2%) 12.50 (1.1%) 4.9% ( 2% - 7%) Wildcard 13.13 (1.8%) 14.43 (1.5%) 9.9% ( 6% - 13%) OrHighMed 7.09 (2.7%) 7.85 (1.6%) 10.8% ( 6% - 15%) OrHighLow 7.16 (2.3%) 7.93 (1.6%) 10.8% ( 6% - 15%) HighTerm 7.59 (2.3%) 8.47 (1.6%) 11.5% ( 7% - 15%) MedTerm 20.14 (1.9%) 22.82 (1.1%) 13.3% ( 10% - 16%) Prefix3 5.78 (2.2%) 6.56 (1.5%) 13.4% ( 9% - 17%) OrHighHigh 4.03 (2.3%) 4.65 (2.0%) 15.4% ( 10% - 20%) IntNRQ 1.92 (2.2%) 2.45 (1.9%) 27.5% ( 22% - 32%) {noformat} 145.3 MB for the new DV vs 129.0 MB for trunk = ~12.6% bigger. > Faster but more RAM/Disk consuming DocValuesFormat for facets > ------------------------------------------------------------- > > Key: LUCENE-4764 > URL: https://issues.apache.org/jira/browse/LUCENE-4764 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.2, 5.0 > > Attachments: LUCENE-4764.patch > > > The new default DV format for binary fields has much more > RAM-efficient encoding of the address for each document ... but it's > also a bit slower at decode time, which affects facets because we > decode for every collected docID. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org