JingsongLi commented on PR #8159:
URL: https://github.com/apache/paimon/pull/8159#issuecomment-4645749480

   To make the suggestion more concrete, the shape I had in mind is something 
like this: let the serializer keep cleanup next to the conversion result, and 
make `RowHelper` release only when the row being serialized is actually its 
reusable row.
   
   ```java
   @Override
   public void serialize(InternalRow row, DataOutputView target) throws 
IOException {
       BinaryRow binaryRow = toBinaryRow(row);
       try {
           binarySerializer.serialize(binaryRow, target);
       } finally {
           rowHelper.resetIfTooLarge(binaryRow);
       }
   }
   
   @Override
   public int serializeToPages(InternalRow row, AbstractPagedOutputView target) 
throws IOException {
       BinaryRow binaryRow = toBinaryRow(row);
       try {
           return binarySerializer.serializeToPages(binaryRow, target);
       } finally {
           rowHelper.resetIfTooLarge(binaryRow);
       }
   }
   ```
   
   And in `RowHelper`:
   
   ```java
   public void resetIfTooLarge(BinaryRow currentRow) {
       if (currentRow == reuseRow
               && reuseWriter != null
               && reuseWriter.getSegments() != null
               && reuseWriter.getSegments().size() > REUSE_RELEASE_THRESHOLD
               && currentRow.getSizeInBytes() < REUSE_RELEASE_THRESHOLD) {
           reuseRow = null;
           reuseWriter = null;
       }
   }
   ```
   
   This does not solve every path by itself, but it avoids using stale 
`reuseRow.getSizeInBytes()` when the current input is already a `BinaryRow`. 
For nested rows, `AbstractBinaryWriter.writeRow()` would need a similar 
after-use hook on the nested serializer, for example:
   
   ```java
   BinaryRow row = serializer.toBinaryRow(input);
   try {
       writeSegmentsToVarLenPart(pos, row.getSegments(), row.getOffset(), 
row.getSizeInBytes());
   } finally {
       serializer.resetIfTooLarge(row);
   }
   ```
   
   The exact API name can be different, but I think this makes the intended 
ownership clearer: only the serializer/helper that produced the reusable row is 
responsible for deciding whether to drop that reusable buffer after the current 
use.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to