JingsongLi commented on PR #8159:
URL: https://github.com/apache/paimon/pull/8159#issuecomment-4660975874

   Thanks for updating the cleanup ownership check. I think the current fixed 
4MB threshold is acceptable as a first guardrail, especially because 
`paimon-common` does not have configuration context and making this 
configurable would add a lot of plumbing for a focused bug fix.
   
   One possible improvement to the release strategy: instead of only releasing 
when the current row is smaller than 4MB, we can combine the fixed cap with a 
relative hysteresis check, for example:
   
   ```java
   bufferCapacity > MAX_RETAINED_REUSE_BUFFER_SIZE
           && bufferCapacity > currentRow.getSizeInBytes() * SHRINK_RATIO
   ```
   
   This keeps the good parts of the current approach:
   
   - normal small rows never pay for cleanup unless the buffer was actually 
inflated;
   - sustained large rows do not thrash by reallocating every record;
   - a 100MB spike followed by tiny rows gets released.
   
   But it also handles the case where a 100MB spike is followed by medium-size 
rows, e.g. 5MB rows. With the current `currentRow < 4MB` condition, the helper 
may keep retaining the 100MB buffer even though the active workload has dropped 
far below that size. A fixed cap plus ratio check would release only when the 
retained buffer is clearly oversized relative to the row currently being 
serialized.
   
   I do not think this needs to become configurable in this PR. If we want to 
keep the patch small, I am also okay with merging the current version and 
treating the relative hysteresis as a follow-up refinement.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to