yugan95 commented on PR #8159:
URL: https://github.com/apache/paimon/pull/8159#issuecomment-4646054988
@JingsongLi Thanks for the concrete suggestion! I've adopted the approach in
the updated commit:
1. resetIfTooLarge(BinaryRow currentRow) with identity check
RowHelper.resetIfTooLarge() now accepts the BinaryRow that was actually
serialized. The first condition is currentRow == reuseRow — if toBinaryRow()
returned the input BinaryRow directly (no conversion), the identity check fails
and no cleanup occurs, avoiding the stale-state issue you identified.
2. InternalRowSerializer captures binaryRow before try
Both serialize() and serializeToPages() now follow the pattern you
suggested:
```
BinaryRow binaryRow = toBinaryRow(row);
try {
binarySerializer.serialize(binaryRow, target);
} finally {
rowHelper.resetIfTooLarge(binaryRow);
}
```
3. Test coverage — added testSkipsWhenCurrentRowIsNotReuseRow() to verify
that passing an external BinaryRow (simulating the BinaryRow input path) does
NOT trigger buffer release even when the helper's buffer is oversized.
Regarding nested row fields and direct toBinaryRow() callers — I agree
those are real gaps. For this PR I'd like to keep the scope focused on the
serialize/serializeToPages entry points which are the primary OOM path. The
nested AbstractBinaryWriter.writeRow() hook and coverage for
DataPagedOutputSerializer.write() can be addressed in a follow-up. Would that
work for you?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]