JingsongLi commented on code in PR #8191:
URL: https://github.com/apache/paimon/pull/8191#discussion_r3386901470


##########
paimon-core/src/main/java/org/apache/paimon/table/FormatTable.java:
##########
@@ -77,6 +77,7 @@ public interface FormatTable extends Table {
     enum Format {
         ORC,
         PARQUET,
+        BLOB,

Review Comment:
   Adding `BLOB` here also exposes format-table projection paths. For a table 
like `(payload BLOB, ds INT) PARTITIONED BY (ds)`, projecting only `ds` makes 
`FormatReadBuilder` remove partition columns before creating the file reader, 
so the `projectedRowType` passed to `BlobFileFormat` is empty. `BlobFileFormat` 
currently requires a BLOB field and throws, whereas other format tables can 
satisfy partition-only projections. Please handle this case, for example by 
reading only the blob file metadata to get the row count and then appending 
partition columns, or by adding an explicit supported projection path with a 
test.



##########
paimon-core/src/main/java/org/apache/paimon/io/FormatTableSingleFileWriter.java:
##########
@@ -64,6 +65,10 @@ public FormatTableSingleFileWriter(
                 out = fileIO.newTwoPhaseOutputStream(path, false);
                 writer = factory.create(out, compression);
             }
+            if (writer instanceof FileAwareFormatWriter) {
+                FileAwareFormatWriter fileAwareFormatWriter = 
(FileAwareFormatWriter) writer;
+                fileAwareFormatWriter.setFile(path);

Review Comment:
   `setFile(path)` is not enough for the `withBlobConsumer` path. 
`BlobFormatWriter` invokes the consumer while writing and the emitted 
descriptor points at this target path, but this writer is backed by a 
`TwoPhaseOutputStream`, so the target file is not visible until 
`FormatTableCommit` commits it; if a later write/commit fails, 
`abort()`/`FormatTableCommit.abort()` discards it anyway. This violates the 
`TableWrite.withBlobConsumer` contract that these files are left for the caller 
to clean up, and leaves already-emitted descriptors dangling. Please either 
make the consumer path use visible/non-deleted files like `SingleFileWriter` 
does with `deleteFileUponAbort()`, or defer/avoid emitting descriptors until 
the file has actually been committed, and add a failure-path test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to