JingsongLi commented on PR #8159:
URL: https://github.com/apache/paimon/pull/8159#issuecomment-4645749480
To make the suggestion more concrete, the shape I had in mind is something
like this: let the serializer keep cleanup next to the conversion result, and
make `RowHelper` release only when the row being serialized is actually its
reusable row.
```java
@Override
public void serialize(InternalRow row, DataOutputView target) throws
IOException {
BinaryRow binaryRow = toBinaryRow(row);
try {
binarySerializer.serialize(binaryRow, target);
} finally {
rowHelper.resetIfTooLarge(binaryRow);
}
}
@Override
public int serializeToPages(InternalRow row, AbstractPagedOutputView target)
throws IOException {
BinaryRow binaryRow = toBinaryRow(row);
try {
return binarySerializer.serializeToPages(binaryRow, target);
} finally {
rowHelper.resetIfTooLarge(binaryRow);
}
}
```
And in `RowHelper`:
```java
public void resetIfTooLarge(BinaryRow currentRow) {
if (currentRow == reuseRow
&& reuseWriter != null
&& reuseWriter.getSegments() != null
&& reuseWriter.getSegments().size() > REUSE_RELEASE_THRESHOLD
&& currentRow.getSizeInBytes() < REUSE_RELEASE_THRESHOLD) {
reuseRow = null;
reuseWriter = null;
}
}
```
This does not solve every path by itself, but it avoids using stale
`reuseRow.getSizeInBytes()` when the current input is already a `BinaryRow`.
For nested rows, `AbstractBinaryWriter.writeRow()` would need a similar
after-use hook on the nested serializer, for example:
```java
BinaryRow row = serializer.toBinaryRow(input);
try {
writeSegmentsToVarLenPart(pos, row.getSegments(), row.getOffset(),
row.getSizeInBytes());
} finally {
serializer.resetIfTooLarge(row);
}
```
The exact API name can be different, but I think this makes the intended
ownership clearer: only the serializer/helper that produced the reusable row is
responsible for deciding whether to drop that reusable buffer after the current
use.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]