[ https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560048#comment-13560048 ]
Michael McCandless commented on LUCENE-4609: -------------------------------------------- The above results were 1M index; here's the full wikipedia en (6.6M docs) results: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff HighSpanNear 2.91 (2.1%) 2.90 (2.4%) -0.6% ( -5% - 4%) Prefix3 46.35 (4.0%) 46.07 (3.9%) -0.6% ( -8% - 7%) PKLookup 240.11 (1.4%) 238.95 (1.9%) -0.5% ( -3% - 2%) Wildcard 73.79 (2.2%) 73.48 (2.3%) -0.4% ( -4% - 4%) IntNRQ 18.05 (6.1%) 18.01 (5.9%) -0.2% ( -11% - 12%) Respell 96.78 (3.1%) 98.09 (3.3%) 1.3% ( -4% - 7%) LowSloppyPhrase 17.63 (4.4%) 17.91 (3.8%) 1.6% ( -6% - 10%) AndHighLow 108.80 (2.8%) 110.58 (4.2%) 1.6% ( -5% - 8%) LowSpanNear 7.53 (4.8%) 7.67 (5.6%) 1.8% ( -8% - 12%) HighSloppyPhrase 0.87 (10.1%) 0.90 (9.6%) 3.2% ( -14% - 25%) Fuzzy2 42.22 (2.5%) 43.90 (2.7%) 4.0% ( -1% - 9%) HighPhrase 15.32 (7.5%) 15.93 (5.4%) 4.0% ( -8% - 18%) LowPhrase 17.09 (4.3%) 18.10 (2.9%) 5.9% ( -1% - 13%) AndHighMed 52.60 (1.4%) 55.90 (2.1%) 6.3% ( 2% - 9%) MedSpanNear 20.09 (2.0%) 21.44 (1.8%) 6.7% ( 2% - 10%) MedSloppyPhrase 18.69 (3.0%) 20.00 (2.7%) 7.0% ( 1% - 13%) Fuzzy1 33.68 (2.0%) 37.26 (2.2%) 10.6% ( 6% - 15%) MedPhrase 57.00 (2.9%) 63.56 (3.3%) 11.5% ( 5% - 18%) MedTerm 19.22 (1.2%) 21.70 (1.1%) 12.9% ( 10% - 15%) LowTerm 41.98 (1.2%) 48.26 (1.8%) 15.0% ( 11% - 18%) AndHighHigh 12.09 (1.0%) 13.98 (1.2%) 15.7% ( 13% - 18%) HighTerm 7.11 (2.1%) 9.11 (2.0%) 28.1% ( 23% - 32%) OrHighMed 6.67 (2.4%) 8.55 (2.1%) 28.2% ( 23% - 33%) OrHighLow 6.76 (2.1%) 8.70 (2.3%) 28.6% ( 23% - 33%) OrHighHigh 3.84 (2.5%) 5.33 (2.7%) 38.7% ( 32% - 45%) {noformat} On-disk size of _dv* is 464768 KB and in memory int[] is 669428 KB (44% more). Next I'll try NO_PARENTS ord policy... > Write a PackedIntsEncoder/Decoder for facets > -------------------------------------------- > > Key: LUCENE-4609 > URL: https://issues.apache.org/jira/browse/LUCENE-4609 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet > Reporter: Shai Erera > Priority: Minor > Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, > LUCENE-4609.patch > > > Today the facets API lets you write IntEncoder/Decoder to encode/decode the > category ordinals. We have several such encoders, including VInt (default), > and block encoders. > It would be interesting to implement and benchmark a > PackedIntsEncoder/Decoder, with potentially two variants: (1) receives > bitsPerValue up front, when you e.g. know that you have a small taxonomy and > the max value you can see and (2) one that decides for each doc on the > optimal bitsPerValue, writes it as a header in the byte[] or something. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org