knoxy5467 opened a new pull request, #20358: URL: https://github.com/apache/kafka/pull/20358
### Summary This PR fixes two critical issues related to producer batch splitting that can cause infinite retry loops and stack overflow errors when batch sizes are significantly larger than broker-configured message size limits. ### Issues Addressed - **KAFKA-8350**: Producers endlessly retry batch splitting when `batch.size` is much larger than topic-level `message.max.bytes`, leading to infinite retry loops with "MESSAGE_TOO_LARGE" errors - **KAFKA-8202**: Stack overflow errors in `FutureRecordMetadata.chain()` due to excessive recursive splitting attempts ### Root Cause The existing batch splitting logic in `RecordAccumulator.splitAndReenqueue()` always used the configured `batchSize` parameter for splitting, regardless of whether the batch had already been split before. This caused: 1. **Infinite loops**: When `batch.size` (e.g., 8MB) >> `message.max.bytes` (e.g., 1MB), splits would never succeed since the split size was still too large 2. **Stack overflow**: Repeated splitting attempts created deep call chains in the metadata chaining logic ### Solution Implemented progressive batch splitting logic: ```java int maxBatchSize = this.batchSize; if (bigBatch.isSplitBatch()) { maxBatchSize = Math.max(bigBatch.maxRecordSize, bigBatch.estimatedSizeInBytes() / 2); } ``` __Key improvements:__ - __First split__: Uses original `batchSize` (maintains backward compatibility) - __Subsequent splits__: Uses the larger of: - `maxRecordSize`: Ensures we can always split down to individual records - `estimatedSizeInBytes() / 2`: Provides geometric reduction for faster convergence ### Testing Added comprehensive test `testSplitAndReenqueuePreventInfiniteRecursion()` that: - Creates oversized batches with 100 records of 1KB each - Verifies splitting can reduce batches to single-record size - Ensures no infinite recursion (safety limit of 100 operations) - Validates no data loss or duplication during splitting - Confirms all original records are preserved with correct keys ### Backward Compatibility - No breaking changes to public APIs - First split attempt still uses original `batchSize` configuration - Progressive splitting only engages for retry scenarios ### -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org