tangdh created LUCENE-10619:
-------------------------------
Summary: Optimize the writeBytes in TermsHashPerField
Key: LUCENE-10619
URL: https://issues.apache.org/jira/browse/LUCENE-10619
Project: Lucene - Core
Issue Type: Improvement
Components: core/index
Affects Versions: 9.2
Reporter: tangdh
Because we don't know the length of slice, writeBytes will always write byte
one after another instead of writing a block of bytes.
May be we could return both offset and length in ByteBlockPool#allocSlice?
1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits.
2. slice size is at most 200, so it could fit in 8 bits.
So we could put them together into an int -------- offset | length
There are only two places where this function is used,the cost of change it is
relatively small.
When allocSlice could return the offset and length of new Slice, we could
writeBytes like below
{code:java}
// write block of bytes each time
while(remaining > 0 ) {
int offsetAndLength = allocSlice(bytes, offset);
length = min(remaining, (offsetAndLength & 0xff) - 1);
offset = offsetAndLength >> 8;
System.arraycopy(src, srcPos, bytePool.buffer, offset, length);
remaining -= length;
offset += (length + 1);
}
{code}
If it's a good idea, I'd like to raise a pr.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]