yugan95 commented on PR #8159:
URL: https://github.com/apache/paimon/pull/8159#issuecomment-4646054988

   @JingsongLi Thanks for the concrete suggestion! I've adopted the approach in 
the updated commit:
   
     1. resetIfTooLarge(BinaryRow currentRow) with identity check
   
     RowHelper.resetIfTooLarge() now accepts the BinaryRow that was actually 
serialized. The first condition is currentRow == reuseRow — if toBinaryRow() 
returned the input BinaryRow directly (no conversion), the identity check fails 
and no cleanup occurs, avoiding the stale-state issue you identified.
   
     2. InternalRowSerializer captures binaryRow before try
   
     Both serialize() and serializeToPages() now follow the pattern you 
suggested:
   ```
     BinaryRow binaryRow = toBinaryRow(row);
     try {
         binarySerializer.serialize(binaryRow, target);
     } finally {
         rowHelper.resetIfTooLarge(binaryRow);
     }
   ```
     3. Test coverage — added testSkipsWhenCurrentRowIsNotReuseRow() to verify 
that passing an external BinaryRow (simulating the BinaryRow input path) does 
NOT trigger buffer release even when the helper's buffer is oversized.
   
     Regarding nested row fields and direct toBinaryRow() callers — I agree 
those are real gaps. For this PR I'd like to keep the scope focused on the 
serialize/serializeToPages entry points which are the primary OOM path. The 
nested AbstractBinaryWriter.writeRow() hook and coverage for 
DataPagedOutputSerializer.write() can be addressed in a follow-up. Would that 
work for you?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to