tangdh created LUCENE-10619:
-------------------------------

             Summary: Optimize the writeBytes in TermsHashPerField
                 Key: LUCENE-10619
                 URL: https://issues.apache.org/jira/browse/LUCENE-10619
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/index
    Affects Versions: 9.2
            Reporter: tangdh


Because we don't know the length of slice, writeBytes will always write byte 
one after another instead of writing a block of bytes.

May be we could return both offset and length in ByteBlockPool#allocSlice?
1. BYTE_BLOCK_SIZE is 32768, offset is at most 15 bits.
2. slice size is at most 200, so it could fit in 8 bits.
So we could put them together into an int -------- offset | length
There are only two places where this function is used,the cost of change it is 
relatively small.

When allocSlice could return the offset and length of new Slice, we could 
writeBytes like below

{code:java}
// write block of bytes each time
while(remaining > 0 ) {
   int offsetAndLength = allocSlice(bytes, offset);
   length = min(remaining, (offsetAndLength & 0xff) - 1);
   offset = offsetAndLength >> 8;
   System.arraycopy(src, srcPos, bytePool.buffer, offset, length);
   remaining -= length;
   offset    += (length + 1);
}
{code}

If it's a good idea, I'd like to raise a pr.





--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to