[ https://issues.apache.org/jira/browse/LUCENE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-4609: --------------------------------------- Attachment: LUCENE-4609.patch Here's another attempt (totally prototype / not committable) at using PackedInts to hold the ords ... It's hacked up: it visits all byte[] from DocValues in the index and converts to in-RAM PackedInts arrays, and then does all facet counting from those arrays. But, the performance is sort of 'meh': {noformat} Task QPS base StdDev QPS comp StdDev Pct diff MedTerm 109.40 (1.5%) 102.06 (1.5%) -6.7% ( -9% - -3%) AndHighLow 374.95 (3.0%) 361.19 (2.6%) -3.7% ( -8% - 1%) AndHighMed 172.57 (1.5%) 169.35 (1.1%) -1.9% ( -4% - 0%) Prefix3 177.54 (6.2%) 174.26 (8.0%) -1.8% ( -15% - 13%) IntNRQ 116.07 (7.5%) 113.97 (9.3%) -1.8% ( -17% - 16%) Fuzzy2 86.19 (2.4%) 85.16 (2.8%) -1.2% ( -6% - 4%) AndHighHigh 46.76 (1.4%) 46.36 (1.1%) -0.8% ( -3% - 1%) LowTerm 146.56 (1.8%) 145.58 (1.4%) -0.7% ( -3% - 2%) HighTerm 26.35 (2.0%) 26.20 (2.1%) -0.6% ( -4% - 3%) MedSpanNear 64.98 (2.3%) 64.62 (2.8%) -0.5% ( -5% - 4%) LowSloppyPhrase 67.07 (2.3%) 66.80 (3.6%) -0.4% ( -6% - 5%) OrHighMed 25.18 (1.6%) 25.10 (2.1%) -0.3% ( -3% - 3%) Wildcard 256.33 (3.1%) 255.56 (3.5%) -0.3% ( -6% - 6%) PKLookup 305.42 (2.3%) 304.72 (2.1%) -0.2% ( -4% - 4%) OrHighLow 24.59 (1.3%) 24.54 (2.2%) -0.2% ( -3% - 3%) Fuzzy1 81.38 (3.0%) 81.60 (2.7%) 0.3% ( -5% - 6%) Respell 141.17 (3.8%) 141.87 (3.9%) 0.5% ( -6% - 8%) LowSpanNear 38.34 (3.2%) 38.78 (3.0%) 1.1% ( -4% - 7%) MedSloppyPhrase 63.80 (2.1%) 64.53 (3.5%) 1.1% ( -4% - 6%) HighSpanNear 10.20 (2.8%) 10.32 (3.1%) 1.2% ( -4% - 7%) MedPhrase 103.16 (4.5%) 104.72 (2.1%) 1.5% ( -4% - 8%) OrHighHigh 17.81 (1.5%) 18.18 (2.7%) 2.1% ( -2% - 6%) LowPhrase 58.77 (5.5%) 60.49 (3.0%) 2.9% ( -5% - 12%) HighPhrase 38.68 (10.0%) 40.46 (5.6%) 4.6% ( -10% - 22%) HighSloppyPhrase 2.97 (7.9%) 3.22 (12.6%) 8.3% ( -11% - 31%) {noformat} Maybe if I used the bulk read PackedInts APIs instead it would be better... > Write a PackedIntsEncoder/Decoder for facets > -------------------------------------------- > > Key: LUCENE-4609 > URL: https://issues.apache.org/jira/browse/LUCENE-4609 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/facet > Reporter: Shai Erera > Priority: Minor > Attachments: LUCENE-4609.patch, LUCENE-4609.patch, LUCENE-4609.patch > > > Today the facets API lets you write IntEncoder/Decoder to encode/decode the > category ordinals. We have several such encoders, including VInt (default), > and block encoders. > It would be interesting to implement and benchmark a > PackedIntsEncoder/Decoder, with potentially two variants: (1) receives > bitsPerValue up front, when you e.g. know that you have a small taxonomy and > the max value you can see and (2) one that decides for each doc on the > optimal bitsPerValue, writes it as a header in the byte[] or something. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org