jerry-024 commented on code in PR #270:
URL: https://github.com/apache/paimon-rust/pull/270#discussion_r3115321536


##########
crates/paimon/src/spec/core_options.rs:
##########
@@ -369,6 +378,24 @@ impl<'a> CoreOptions<'a> {
             .and_then(|v| v.parse().ok())
             .unwrap_or(DEFAULT_DYNAMIC_BUCKET_TARGET_ROW_NUM)
     }
+
+    /// When true, blob field reads return serialized BlobDescriptor bytes
+    /// instead of actual blob bytes. Default is false.
+    pub fn blob_as_descriptor(&self) -> bool {
+        self.options
+            .get(BLOB_AS_DESCRIPTOR_OPTION)
+            .map(|v| v.eq_ignore_ascii_case("true"))
+            .unwrap_or(false)
+    }
+
+    /// Comma-separated BLOB field names stored as serialized BlobDescriptor
+    /// bytes inline in normal data files (no .blob files for these fields).
+    pub fn blob_descriptor_fields(&self) -> HashSet<String> {

Review Comment:
   `blob-descriptor-field` needs schema-level validation before we use this 
set. In Java, every configured field must exist and must be a top-level `BLOB` 
field. Here we only parse the option string, so typos or nested / non-BLOB 
fields are silently accepted and we diverge from Java behavior. Can we validate 
this during schema construction / table initialization instead of letting it 
flow into the writer path?



##########
crates/paimon/src/table/table_write.rs:
##########
@@ -249,13 +260,18 @@ impl TableWrite {
             ))
         };
 
+        let has_blob_fields = schema.fields().iter().any(|f| {
+            f.data_type().contains_blob_type() && 
!blob_descriptor_fields.contains(f.name())

Review Comment:
   This is broader than the Java blob contract. `contains_blob_type()` recurses 
into nested types, so a column like `ROW<blob BLOB>` now flips 
`has_blob_fields` and routes the whole top-level column into 
`AppendBlobFileWriter`, but the blob writer still only accepts a single 
top-level `BinaryArray`. That makes the new nested-blob acceptance path a false 
positive: the schema is accepted, then the write path fails at runtime. I think 
this needs to be tightened to top-level `DataType::Blob(_)` only (matching Java 
`BlobType.fieldsInBlobFile`) or explicitly rejected during schema validation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to