[PR] [core] Fix OOM when writing/compacting table with large records [paimon]

via GitHub Thu, 09 Apr 2026 20:18:45 -0700


yugan95 opened a new pull request, #7621:
URL: https://github.com/apache/paimon/pull/7621


   ### Purpose
   Linked issue: close #7620
   Fix OOM when writing table with large records (100MB+) due to unbounded 
buffer growth in sort, merge and compaction paths.
   Heap dump analysis identified four independent root causes:
   
   **1. Sort path — `RowHelper` internal buffer never shrinks**
   
   `RowHelper.reuseWriter` grows its internal `MemorySegment` list for large 
records, but `BinaryRowWriter.reset()` only resets the cursor without releasing 
oversized segments. Additionally, `InternalRowSerializer.serialize()` can exit 
via `EOFException` (a normal signal when the sort buffer is full), skipping any 
cleanup of the bloated buffer.
   
   **2. Merge path — `BinaryRowSerializer.deserialize(reuse)` only grows, never 
shrinks**
   
   Each merge channel holds a `BinaryRow` reuse instance. When a large record 
is deserialized, the backing `MemorySegment` grows to fit it but is never 
shrunk for subsequent small records. With `max-num-file-handles` (default 128) 
channels each retaining a 100MB+ buffer, memory usage explodes.
   
   **3. Compaction read path — `HeapBytesVector.reserveBytes()` integer 
overflow**
   
   `reserveBytes()` computes `newCapacity * 2` using plain multiplication. When 
`newCapacity` exceeds ~1.07 billion bytes, this overflows `Integer.MAX_VALUE`, 
causing `NegativeArraySizeException` or silent data corruption.
   
   **4. Parquet write — statistics and page-size-check config not passed 
through**
   
   `RowDataParquetBuilder` does not pass through 
`parquet.statistics.truncate.length`, `parquet.columnindex.truncate.length`, 
`parquet.page.size.row.check.min`, and `parquet.page.size.row.check.max`. 
Without these, users cannot tune Parquet behavior for large-record scenarios, 
leading to multi-GB pages and bloated footers.
   
   #### Changes
   
   1. **`RowHelper`**: add `resetIfTooLarge()` — release internal buffer when 
segments exceed 4MB
   2. **`InternalRowSerializer`**: call `resetIfTooLarge()` in `finally` block 
of `serialize()` and `serializeToPages()` to handle `EOFException` exit path
   3. **`BinaryRowSerializer`**: add shrink logic in `deserialize(reuse)` — 
reallocate when existing buffer > 4MB threshold
   4. **`HeapBytesVector`**: use bit-shift (`<< 1`) instead of `* 2`, cap at 
`MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8`, throw clear error on overflow
   5. **`RowDataParquetBuilder`**: pass through `statistics.truncate.length`, 
`columnindex.truncate.length`, `min-row-count-for-page-size-check`, 
`max-row-count-for-page-size-check` from config
   
   ### Tests
   - `RowHelperTest` — validates `resetIfTooLarge()` releases oversized buffers 
(> 4MB) and preserves small ones
   - `BinaryRowSerializerShrinkTest` — validates `deserialize(reuse)` shrinks 
oversized buffers and preserves small ones
   - `HeapBytesVectorReserveBytesTest` — validates overflow-safe 
`reserveBytes()` growth and data correctness
   
   ### API and Format
   
   N/A — no public API or format changes.
   
   ### Documentation
   
   N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [core] Fix OOM when writing/compacting table with large records [paimon]

Reply via email to