ashish159357 opened a new pull request, #15330:
URL: https://github.com/apache/lucene/pull/15330
Problem
ByteBlockPool uses 32KB buffers with an integer offset tracker (
byteOffset). When more than 65,535 buffers are allocated, integer overflow
occurs in the byteOffset calculation (byteOffset = bufferUpto *
BYTE_BLOCK_SIZE), causing ArithmeticException during indexing of documents with
large numbers of tokens.
Root Cause
- Each buffer is 32KB (BYTE_BLOCK_SIZE = 32768)
- Maximum safe buffer count: Integer.MAX_VALUE / BYTE_BLOCK_SIZE = 65535
- When bufferUpto >= 65535, the multiplication overflows
Solution
Implement proactive DWPT flushing when buffer count approaches the limit:
1. Detection: Added isApproachingBufferLimit() method to detect when buffer
count approaches the overflow threshold
2. Propagation: Buffer limit status flows from ByteBlockPool →
IndexingChain → DocumentsWriterPerThread → DocumentsWriterFlushControl
3. Prevention: Force flush DWPT before overflow occurs, similar to existing
RAM-based flushing.
Key Changes
- Added buffer limit detection in ByteBlockPool
- Integrated check into DocumentsWriterFlushControl.doAfterDocument()
- Uses threshold of 65,000 to provide safety margin before actual limit of
65,535
- Maintains existing performance characteristics while preventing crashes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]