vigyasharma commented on code in PR #966:
URL: https://github.com/apache/lucene/pull/966#discussion_r901031392
##########
lucene/core/src/java/org/apache/lucene/index/TermsHashPerField.java:
##########
@@ -230,9 +230,29 @@ final void writeByte(int stream, byte b) {
}
final void writeBytes(int stream, byte[] b, int offset, int len) {
- // TODO: optimize
final int end = offset + len;
- for (int i = offset; i < end; i++) writeByte(stream, b[i]);
+ int streamAddress = streamAddressOffset + stream;
+ int upto = termStreamAddressBuffer[streamAddress];
+ byte[] slice = bytePool.buffers[upto >> ByteBlockPool.BYTE_BLOCK_SHIFT];
+ assert slice != null;
+ int sliceOffset = upto & ByteBlockPool.BYTE_BLOCK_MASK;
+
+ while (slice[sliceOffset] == 0 && offset < end) {
+ slice[sliceOffset++] = b[offset++];
+ (termStreamAddressBuffer[streamAddress])++;
+ }
+
+ while (offset < end) {
+ int offsetAndLength = bytePool.allocKnownSizeSlice(slice, sliceOffset);
+ sliceOffset = offsetAndLength >> 8;
+ int sliceLength = offsetAndLength & 0xff;
+ slice = bytePool.buffer;
+ int writeLength = Math.min(sliceLength - 1, end - offset);
+ System.arraycopy(b, offset, slice, sliceOffset, writeLength);
+ sliceOffset += writeLength;
+ offset += writeLength;
+ termStreamAddressBuffer[streamAddress] = sliceOffset +
bytePool.byteOffset;
+ }
Review Comment:
Wow, this is an interesting change! Can you help me better understand this
optimization?
Is `System.arraycopy()` much faster than manually copying like in your while
loop above. I would guess it is similar since there are no extra checks for
primitive or object type [[ref: fast array copy
comparison](https://www.javaspecialists.eu/archive/Issue124-Copying-Arrays-Fast.html)],
but I'm not sure.
This is probably better than calling `writeByte()` in a loop because we
don't need to fetch all the pointers every time.
A follow up idea: I think `allocSlice()` (or `allocKnownSizeSlice()` here),
always returns the next higher sized buffer slice.. Maybe `allocKnownSizeSlice`
could instead return a slice appropriate for the length of pending items we
need to write `(end - offset)`? We could then avoid having to call alloc
multiple times for larger writes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]