[ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852568#action_12852568 ]
Toke Eskildsen commented on LUCENE-1990: ---------------------------------------- I've located the bug and fixed it. As expected, it was in the write-masks. Unfortunately I'm running out of time, so I cannot make a patch right now. The code for Packed64 is {code} private static final long[][] WRITE_MASKS = new long[ENTRY_SIZE][ENTRY_SIZE * FAC_BITPOS]; static { for (int elementBits = 1 ; elementBits <= BLOCK_SIZE ; elementBits++) { long elementPosMask = ~(~0L << elementBits); int[] currentShifts = SHIFTS[elementBits]; long[] currentMasks = WRITE_MASKS[elementBits]; for (int bitPos = 0 ; bitPos < BLOCK_SIZE ; bitPos++) { int base = bitPos * FAC_BITPOS; currentMasks[base ] =~((elementPosMask << currentShifts[base + 1]) >>> currentShifts[base]); currentMasks[base+1] = ~(elementPosMask << currentShifts[base + 2]); currentMasks[base+2] = currentShifts[base + 2] == 0 ? 0 : ~0; if (bitPos <= BLOCK_SIZE - elementBits) { // Second block not used currentMasks[base+1] = ~0; // Keep all bits currentMasks[base+2] = 0; // Or with 0 } } } } {code} The changed code is the addition of the last check for second block usage. Likewise the fix for Packed32 is {code} private static final int[][] WRITE_MASKS = new int[ENTRY_SIZE][ENTRY_SIZE * FAC_BITPOS]; static { for (int elementBits = 1 ; elementBits <= BLOCK_SIZE ; elementBits++) { int elementPosMask = ~(~0 << elementBits); int[] currentShifts = SHIFTS[elementBits]; int[] currentMasks = WRITE_MASKS[elementBits]; for (int bitPos = 0 ; bitPos < BLOCK_SIZE ; bitPos++) { int base = bitPos * FAC_BITPOS; currentMasks[base ] =~((elementPosMask << currentShifts[base + 1]) >>> currentShifts[base]); currentMasks[base+1] = ~(elementPosMask << currentShifts[base + 2]); currentMasks[base+2] = currentShifts[base + 2] == 0 ? 0 : ~0; if (bitPos <= BLOCK_SIZE - elementBits) { // Second block not used currentMasks[base+1] = ~0; // Keep all bits currentMasks[base+2] = 0; // Or with 0 } } } } {code} Without checking thoroughly, I'd expect the two pieces of code to be exactly the same, at the difference between Packed32 and Packed64 is just long vs. int and some constants. The unit-test from above can be used for Packed32 by explicitly creating a Packed32 instead of calling the factory. I'll be back behind the screen in a few days where I can make a patch, but you are more than welcome to roll the patch if it is more convenient to get it immediately. > Add unsigned packed int impls in oal.util > ----------------------------------------- > > Key: LUCENE-1990 > URL: https://issues.apache.org/jira/browse/LUCENE-1990 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Flex Branch > Reporter: Michael McCandless > Priority: Minor > Fix For: Flex Branch > > Attachments: generated_performance-te20100226.txt, > LUCENE-1990-te20100122.patch, LUCENE-1990-te20100210.patch, > LUCENE-1990-te20100212.patch, LUCENE-1990-te20100223.patch, > LUCENE-1990-te20100226.patch, LUCENE-1990-te20100226b.patch, > LUCENE-1990-te20100226c.patch, LUCENE-1990-te20100301.patch, > LUCENE-1990.patch, LUCENE-1990.patch, > LUCENE-1990_PerformanceMeasurements20100104.zip, perf-mkm-20100227.txt, > performance-20100301.txt, performance-te20100226.txt > > > There are various places in Lucene that could take advantage of an > efficient packed unsigned int/long impl. EG the terms dict index in > the standard codec in LUCENE-1458 could subsantially reduce it's RAM > usage. FieldCache.StringIndex could as well. And I think "load into > RAM" codecs like the one in TestExternalCodecs could use this too. > I'm picturing something very basic like: > {code} > interface PackedUnsignedLongs { > long get(long index); > void set(long index, long value); > } > {code} > Plus maybe an iterator for getting and maybe also for setting. If it > helps, most of the usages of this inside Lucene will be "write once" > so eg the set could make that an assumption/requirement. > And a factory somewhere: > {code} > PackedUnsignedLongs create(int count, long maxValue); > {code} > I think we should simply autogen the code (we can start from the > autogen code in LUCENE-1410), or, if there is an good existing impl > that has a compatible license that'd be great. > I don't have time near-term to do this... so if anyone has the itch, > please jump! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org