JingsongLi commented on code in PR #8219:
URL: https://github.com/apache/paimon/pull/8219#discussion_r3417538887
##########
paimon-core/src/main/java/org/apache/paimon/append/ExternalStorageBlobWriter.java:
##########
@@ -201,12 +205,14 @@ private static ExternalStorageBlobFieldWriter
createFieldWriter(
FileSource fileSource,
boolean asyncFileWrite,
boolean statsDenseStore,
- long targetFileSize) {
+ long targetFileSize,
+ boolean writeNullOnMissingFile) {
int fieldIndex = writeSchema.getFieldIndex(fieldName);
ExternalStorageBlobFieldWriter fieldWriter = new
ExternalStorageBlobFieldWriter(fieldIndex);
BlobFileFormat blobFileFormat = new BlobFileFormat();
blobFileFormat.setWriteConsumer(fieldWriter);
+ blobFileFormat.setWriteNullOnMissingFile(writeNullOnMissingFile);
Review Comment:
This still does not make missing external-storage blobs become NULL in the
data file. When `BlobFormatWriter` hits the 404 fallback it calls
`BlobConsumer.accept(blobFieldName, null)`, so `writeAndReplace` returns
`null`. But `transformRow` only sets `overrideRow` when the descriptor is
non-null, and the final `FallbackMappingRow` treats a null override as "fall
back to the original row". As a result, after the external copy writes a NULL
entry, the normal writer still stores the original descriptor bytes in the
inline descriptor field; reads will later point back to the missing HTTP URL
instead of seeing NULL. We need to distinguish "no override" from "explicit
NULL override" here, and add a test for an external-storage descriptor field
where the source blob disappears/404s during the external copy.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]