JingsongLi commented on PR #8159:
URL: https://github.com/apache/paimon/pull/8159#issuecomment-4660975874
Thanks for updating the cleanup ownership check. I think the current fixed
4MB threshold is acceptable as a first guardrail, especially because
`paimon-common` does not have configuration context and making this
configurable would add a lot of plumbing for a focused bug fix.
One possible improvement to the release strategy: instead of only releasing
when the current row is smaller than 4MB, we can combine the fixed cap with a
relative hysteresis check, for example:
```java
bufferCapacity > MAX_RETAINED_REUSE_BUFFER_SIZE
&& bufferCapacity > currentRow.getSizeInBytes() * SHRINK_RATIO
```
This keeps the good parts of the current approach:
- normal small rows never pay for cleanup unless the buffer was actually
inflated;
- sustained large rows do not thrash by reallocating every record;
- a 100MB spike followed by tiny rows gets released.
But it also handles the case where a 100MB spike is followed by medium-size
rows, e.g. 5MB rows. With the current `currentRow < 4MB` condition, the helper
may keep retaining the 100MB buffer even though the active workload has dropped
far below that size. A fixed cap plus ratio check would release only when the
retained buffer is clearly oversized relative to the row currently being
serialized.
I do not think this needs to become configurable in this PR. If we want to
keep the patch small, I am also okay with merging the current version and
treating the relative hysteresis as a follow-up refinement.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]