JingsongLi commented on code in PR #8219:
URL: https://github.com/apache/paimon/pull/8219#discussion_r3410550754


##########
paimon-format/src/main/java/org/apache/paimon/format/blob/BlobFormatWriter.java:
##########
@@ -96,22 +112,35 @@ public void addElement(InternalRow element) throws 
IOException {
             return;
         }
 
-        long previousPos = out.getPos();
-        crc32.reset();
+        SeekableInputStream in;
+        try {
+            in = blob.newInputStream();
+        } catch (IOException | RuntimeException e) {
+            if (writeNullOnMissingFile && isNotFoundError(e)) {
+                LOG.warn(
+                        "Failed to open blob from {}, writing NULL for BLOB 
field {}.",
+                        blobUri(blob),
+                        blobFieldName,
+                        e);
+                writeNullElement();
+                return;
+            }
+            throw e;
+        }
 
         write(MAGIC_NUMBER_BYTES);

Review Comment:
   Looks like `crc32.reset()` was dropped while removing the in-memory staging 
path. The CRC field is still written per blob record below, and the old 
streaming implementation reset the CRC before writing each record. Without that 
reset, the second and later non-null blobs will write a checksum that also 
includes bytes from previous blobs, so the blob file contains invalid 
per-record CRC values. Could we reset the CRC before writing 
`MAGIC_NUMBER_BYTES` for each non-null blob?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to