wwj6591812 commented on code in PR #8219:
URL: https://github.com/apache/paimon/pull/8219#discussion_r3409252231


##########
paimon-format/src/main/java/org/apache/paimon/format/blob/BlobFormatWriter.java:
##########
@@ -97,21 +114,35 @@ public void addElement(InternalRow element) throws 
IOException {
         }
 
         long previousPos = out.getPos();
-        crc32.reset();
-
-        write(MAGIC_NUMBER_BYTES);
-
-        long blobPos = out.getPos();
+        ByteArrayOutputStream stagedPayload = new ByteArrayOutputStream();

Review Comment:
   @JingsongLi 
   
   Thanks for the follow-up — you're right that staging the full payload in 
memory is not acceptable for normal large blob writes.
   
   I've removed the ByteArrayOutputStream path and switched to the approach you 
suggested: open the source stream before appending anything to out. If opening 
fails with HTTP 404 and blob-write-null-on-missing-file is enabled, we record 
NULL without touching out; otherwise we keep the original streaming copy path 
(magic number + chunked read/write). This avoids the orphan-magic-byte issue 
for the common HTTP 404-at-open case without adding extra memory overhead for 
valid blobs.
   
   testMissingHttpBlobFollowedByValidBlobPreservesReadback still passes. Please 
take another look when you have a chance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to