uschindler commented on a change in pull request #327: URL: https://github.com/apache/lucene/pull/327#discussion_r719127442
########## File path: lucene/core/src/java/org/apache/lucene/util/packed/DirectWriter.java ########## @@ -94,38 +91,54 @@ private void flush() throws IOException { } // Avoid writing bits from values that are outside of the range we need to encode Arrays.fill(nextValues, off, nextValues.length, 0L); - encode(nextValues, 0, nextBlocks, 0, iterations); + encode(nextValues, off, nextBlocks, bitsPerValue); final int blockCount = (int) PackedInts.Format.PACKED.byteCount(PackedInts.VERSION_CURRENT, off, bitsPerValue); output.writeBytes(nextBlocks, blockCount); off = 0; } - public void encode( - long[] values, int valuesOffset, byte[] blocks, int blocksOffset, int iterations) { - int nextBlock = 0; - int bitsUsed = 0; - for (int i = 0; i < byteValueCount * iterations; ++i) { - final long v = values[valuesOffset++]; - assert PackedInts.unsignedBitsRequired(v) <= bitsPerValue; - if (bitsUsed < byteOffset) { - // just buffer - nextBlock |= v << bitsUsed; - bitsUsed += bitsPerValue; - } else { - // flush as many blocks as possible - blocks[blocksOffset++] = (byte) (nextBlock | (v << bitsUsed)); - int bits = 8 - bitsUsed; - while (bits <= bitsUsedOffset) { - blocks[blocksOffset++] = (byte) (v >> bits); - bits += 8; + private static void encode(long[] nextValues, int upTo, byte[] nextBlocks, int bitsPerValue) { + if ((bitsPerValue & 7) == 0) { + // bitsPerValue is a multiple of 8: 8, 16, 24, 32, 30, 48, 56, 64 + final int bytesPerValue = bitsPerValue / Byte.SIZE; + for (int i = 0, o = 0; i < upTo; ++i, o += bytesPerValue) { Review comment: Hi, it was too late yesterday to do any test. I just drafted my idea and went to sleep. I don't know which benchmark you used (lucenebench and how was it called - the taxidriver bench was completely new for me; I have no idea how to start it?) I can quickly start a benchmark but I want numbers comparable so it should be same than the one you used. > It should hoist it as a loop constant. I hope so, I am just afraid that the code is too complex (4 branches). In addition: how many iterations have the loop? If it is called often for loops with only few elements, then I am not sure if it helps, because whenever the bitsize changes when encode() is called, it will be deoptimized and it starts over again. Because of this I made the 2 drafts. I would now prefer the #333 because of cleaner code. Both should behave similar. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org