JingsongLi commented on code in PR #8021:
URL: https://github.com/apache/paimon/pull/8021#discussion_r3348136677


##########
paimon-python/pypaimon/schema/schema.py:
##########
@@ -62,39 +62,8 @@ def from_pyarrow_schema(pa_schema: pa.Schema, 
partition_keys: Optional[List[str]
                 if field.name in pk_set:
                     field.type.nullable = False
 
-        # Check if Blob type exists in the schema
-        blob_names = [
-            field.name for field in fields
-            if 'blob' in str(field.type).lower()
-        ]
-
-        if blob_names:
-            if options is None:
-                options = {}
-
-            if len(fields) <= len(blob_names):
-                raise ValueError(
-                    "Table with BLOB type column must have other normal 
columns."
-                )
-
-            required_options = {
-                CoreOptions.ROW_TRACKING_ENABLED.key(): 'true',
-                CoreOptions.DATA_EVOLUTION_ENABLED.key(): 'true'
-            }
-
-            missing_options = []
-            for key, expected_value in required_options.items():
-                if key not in options or options[key] != expected_value:
-                    missing_options.append(f"{key}='{expected_value}'")
-
-            if missing_options:
-                raise ValueError(
-                    f"Schema contains Blob type but is missing required 
options: {', '.join(missing_options)}. "
-                    f"Please add these options to the schema."
-                )
-
-            if primary_keys is not None:
-                raise ValueError("Blob type is not supported with primary 
key.")
+        # Validate Blob type fields in the schema
+        Schema._validate_blob_fields(fields, options, primary_keys)

Review Comment:
   I agree that Python does not yet have a Java-style centralized validation 
layer, but this PR is adding new invariants and new read/write behavior that 
depends on them. Leaving the direct `Schema(...)` path unchecked means users 
can commit invalid `blob-view-field` / `blob-descriptor-field` options today, 
and the failure then moves to later writes/reads with much less clear errors.
   
   This does not need a full validation refactor in this PR. A narrow fix would 
be to call the same blob-field validation from the schema commit path (for 
example in `SchemaManager.create_table` after the schema is materialized and 
before `schema-0` is written), so both `Schema.from_pyarrow_schema(...)` and 
direct `Schema(...)` creation enforce the same invariants.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to