[ https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560129#comment-13560129 ]
Michael McCandless commented on LUCENE-4609: -------------------------------------------- Ugh! My DV total bytes numbers were too high: luceneutil also indexes title field as DV. So ignore past byte sizes ... here's the [correct, I hope!] byte sizes for the NO_PARENTS case, full 6.6M Wikipedia en index: DV (index) 151208 KB, int[] (in RAM): 305889 KB. And NO_PARENTS perf (base = trunk, comp = int[] collector): {noformat} Task QPS base StdDev QPS comp StdDev Pct diff Wildcard 74.70 (3.3%) 74.32 (1.9%) -0.5% ( -5% - 4%) PKLookup 245.87 (1.8%) 244.80 (2.0%) -0.4% ( -4% - 3%) HighPhrase 15.68 (5.7%) 15.72 (6.4%) 0.2% ( -11% - 12%) Respell 111.09 (3.5%) 111.33 (3.7%) 0.2% ( -6% - 7%) AndHighLow 97.90 (1.6%) 98.16 (1.4%) 0.3% ( -2% - 3%) LowSpanNear 7.62 (3.8%) 7.67 (3.5%) 0.7% ( -6% - 8%) Prefix3 45.94 (5.6%) 46.34 (2.7%) 0.9% ( -6% - 9%) IntNRQ 18.04 (8.2%) 18.20 (4.6%) 0.9% ( -11% - 14%) LowSloppyPhrase 17.77 (2.9%) 17.94 (4.8%) 1.0% ( -6% - 8%) Fuzzy2 41.36 (2.4%) 42.68 (2.3%) 3.2% ( -1% - 8%) LowPhrase 16.94 (2.4%) 17.65 (3.5%) 4.1% ( -1% - 10%) HighSpanNear 2.98 (2.8%) 3.14 (2.1%) 5.3% ( 0% - 10%) AndHighMed 49.18 (1.0%) 51.97 (0.7%) 5.7% ( 3% - 7%) HighSloppyPhrase 0.90 (6.7%) 0.97 (12.6%) 6.8% ( -11% - 27%) MedSloppyPhrase 18.54 (1.8%) 19.91 (3.0%) 7.4% ( 2% - 12%) MedSpanNear 19.86 (1.6%) 21.36 (2.0%) 7.5% ( 3% - 11%) MedPhrase 55.57 (2.2%) 60.31 (2.3%) 8.5% ( 3% - 13%) Fuzzy1 33.38 (1.4%) 37.19 (1.9%) 11.4% ( 8% - 14%) AndHighHigh 12.58 (1.2%) 14.66 (0.9%) 16.6% ( 14% - 18%) LowTerm 40.41 (1.2%) 47.14 (1.4%) 16.6% ( 13% - 19%) MedTerm 23.00 (1.4%) 27.14 (3.0%) 18.0% ( 13% - 22%) OrHighMed 7.50 (2.2%) 10.16 (2.3%) 35.6% ( 30% - 40%) OrHighLow 7.55 (2.0%) 10.30 (2.8%) 36.3% ( 30% - 41%) HighTerm 7.92 (1.9%) 10.98 (2.8%) 38.6% ( 33% - 44%) OrHighHigh 4.30 (2.7%) 6.39 (3.0%) 48.6% ( 41% - 55%) {noformat} > Write a PackedIntsEncoder/Decoder for facets > -------------------------------------------- > > Key: LUCENE-4609 > URL: https://issues.apache.org/jira/browse/LUCENE-4609 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet > Reporter: Shai Erera > Priority: Minor > Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch, > LUCENE-4609.patch > > > Today the facets API lets you write IntEncoder/Decoder to encode/decode the > category ordinals. We have several such encoders, including VInt (default), > and block encoders. > It would be interesting to implement and benchmark a > PackedIntsEncoder/Decoder, with potentially two variants: (1) receives > bitsPerValue up front, when you e.g. know that you have a small taxonomy and > the max value you can see and (2) one that decides for each doc on the > optimal bitsPerValue, writes it as a header in the byte[] or something. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org