wwj6591812 commented on code in PR #8219:
URL: https://github.com/apache/paimon/pull/8219#discussion_r3409252231
##########
paimon-format/src/main/java/org/apache/paimon/format/blob/BlobFormatWriter.java:
##########
@@ -97,21 +114,35 @@ public void addElement(InternalRow element) throws
IOException {
}
long previousPos = out.getPos();
- crc32.reset();
-
- write(MAGIC_NUMBER_BYTES);
-
- long blobPos = out.getPos();
+ ByteArrayOutputStream stagedPayload = new ByteArrayOutputStream();
Review Comment:
@JingsongLi
Thanks for the follow-up — you're right that staging the full payload in
memory is not acceptable for normal large blob writes.
I've removed the ByteArrayOutputStream path and switched to the approach you
suggested: open the source stream before appending anything to out. If opening
fails with HTTP 404 and blob-write-null-on-missing-file is enabled, we record
NULL without touching out; otherwise we keep the original streaming copy path
(magic number + chunked read/write). This avoids the orphan-magic-byte issue
for the common HTTP 404-at-open case without adding extra memory overhead for
valid blobs.
testMissingHttpBlobFollowedByValidBlobPreservesReadback still passes. Please
take another look when you have a chance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]