JingsongLi commented on code in PR #8191:
URL: https://github.com/apache/paimon/pull/8191#discussion_r3386901470
##########
paimon-core/src/main/java/org/apache/paimon/table/FormatTable.java:
##########
@@ -77,6 +77,7 @@ public interface FormatTable extends Table {
enum Format {
ORC,
PARQUET,
+ BLOB,
Review Comment:
Adding `BLOB` here also exposes format-table projection paths. For a table
like `(payload BLOB, ds INT) PARTITIONED BY (ds)`, projecting only `ds` makes
`FormatReadBuilder` remove partition columns before creating the file reader,
so the `projectedRowType` passed to `BlobFileFormat` is empty. `BlobFileFormat`
currently requires a BLOB field and throws, whereas other format tables can
satisfy partition-only projections. Please handle this case, for example by
reading only the blob file metadata to get the row count and then appending
partition columns, or by adding an explicit supported projection path with a
test.
##########
paimon-core/src/main/java/org/apache/paimon/io/FormatTableSingleFileWriter.java:
##########
@@ -64,6 +65,10 @@ public FormatTableSingleFileWriter(
out = fileIO.newTwoPhaseOutputStream(path, false);
writer = factory.create(out, compression);
}
+ if (writer instanceof FileAwareFormatWriter) {
+ FileAwareFormatWriter fileAwareFormatWriter =
(FileAwareFormatWriter) writer;
+ fileAwareFormatWriter.setFile(path);
Review Comment:
`setFile(path)` is not enough for the `withBlobConsumer` path.
`BlobFormatWriter` invokes the consumer while writing and the emitted
descriptor points at this target path, but this writer is backed by a
`TwoPhaseOutputStream`, so the target file is not visible until
`FormatTableCommit` commits it; if a later write/commit fails,
`abort()`/`FormatTableCommit.abort()` discards it anyway. This violates the
`TableWrite.withBlobConsumer` contract that these files are left for the caller
to clean up, and leaves already-emitted descriptors dangling. Please either
make the consumer path use visible/non-deleted files like `SingleFileWriter`
does with `deleteFileUponAbort()`, or defer/avoid emitting descriptors until
the file has actually been committed, and add a failure-path test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]