Re: [PR] [WIP] [core] Introduce BLOB_REF for shared blob data [paimon]

via GitHub Mon, 06 Apr 2026 21:07:12 -0700


leaves12138 commented on code in PR #7602:
URL: https://github.com/apache/paimon/pull/7602#discussion_r3042801633



##########
paimon-spark/paimon-spark-common/src/main/java/org/apache/paimon/spark/SparkCatalog.java:
##########
@@ -495,6 +497,11 @@ private Schema toInitialSchema(
                         field.dataType() instanceof 
org.apache.spark.sql.types.BinaryType,
                         "The type of blob field must be binary");
                 type = new BlobType();
+            } else if (blobRefFields.contains(name)) {

Review Comment:
   Addressed in `915465dc44`. `SparkInternalRow.blobFields(...)` now includes 
`BLOB_REF`, and both `SparkInternalRowWrapper#getBlob` and `SparkRow#getBlob` 
now decode through `BlobUtils.fromBytes(...)` with the `BlobReferenceLookup` 
resolver, so the V1/V2 write paths no longer wrap `BLOB_REF` bytes as plain 
`BlobData`.



##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/FlinkCatalog.java:
##########
@@ -1077,9 +1088,13 @@ private static org.apache.paimon.types.DataType 
resolveDataType(
             org.apache.flink.table.types.logical.LogicalType logicalType,
             Map<String, String> options) {
         List<String> blobFields = CoreOptions.blobField(options);
+        List<String> blobRefFields = CoreOptions.blobRefField(options);
         if (blobFields.contains(fieldName)) {
             return toBlobType(logicalType);
         }
+        if (blobRefFields.contains(fieldName)) {

Review Comment:
   Addressed in `915465dc44`. `FileStoreSourceSplitReader` now treats 
`BLOB_REF` the same as `BLOB` when selecting the blob-aware row wrapper, so the 
Flink source path no longer returns raw serialized `BlobReference` bytes and 
`blob-as-descriptor` applies consistently.



##########
paimon-format/src/main/java/org/apache/paimon/format/orc/writer/FieldWriterFactory.java:
##########
@@ -264,6 +265,18 @@ public FieldWriter visit(BlobType blobType) {
         };
     }
 
+    @Override
+    public FieldWriter visit(BlobRefType blobRefType) {

Review Comment:
   Addressed in `915465dc44`. `OrcTypeUtil.convertToOrcType(...)` now maps 
`BLOB_REF` to ORC binary before the writer path, and I added `OrcTypeUtilTest` 
coverage for the new type.



##########
paimon-format/src/main/java/org/apache/paimon/format/parquet/reader/ParquetVectorUpdaterFactory.java:
##########
@@ -230,6 +231,11 @@ public UpdaterFactory visit(BlobType blobType) {
             };
         }
 
+        @Override
+        public UpdaterFactory visit(BlobRefType blobRefType) {

Review Comment:
   Addressed in `915465dc44`. I updated `ParquetSchemaConverter`, 
`ParquetRowDataWriter`, and `ParquetReaderUtil` so `BLOB_REF` is handled as 
reference bytes end-to-end on the parquet schema/write/read path. While 
touching the format stack I also filled the same schema/read/write gap for Avro.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [WIP] [core] Introduce BLOB_REF for shared blob data [paimon]

Reply via email to