JingsongLi commented on code in PR #8021:
URL: https://github.com/apache/paimon/pull/8021#discussion_r3348136677
##########
paimon-python/pypaimon/schema/schema.py:
##########
@@ -62,39 +62,8 @@ def from_pyarrow_schema(pa_schema: pa.Schema,
partition_keys: Optional[List[str]
if field.name in pk_set:
field.type.nullable = False
- # Check if Blob type exists in the schema
- blob_names = [
- field.name for field in fields
- if 'blob' in str(field.type).lower()
- ]
-
- if blob_names:
- if options is None:
- options = {}
-
- if len(fields) <= len(blob_names):
- raise ValueError(
- "Table with BLOB type column must have other normal
columns."
- )
-
- required_options = {
- CoreOptions.ROW_TRACKING_ENABLED.key(): 'true',
- CoreOptions.DATA_EVOLUTION_ENABLED.key(): 'true'
- }
-
- missing_options = []
- for key, expected_value in required_options.items():
- if key not in options or options[key] != expected_value:
- missing_options.append(f"{key}='{expected_value}'")
-
- if missing_options:
- raise ValueError(
- f"Schema contains Blob type but is missing required
options: {', '.join(missing_options)}. "
- f"Please add these options to the schema."
- )
-
- if primary_keys is not None:
- raise ValueError("Blob type is not supported with primary
key.")
+ # Validate Blob type fields in the schema
+ Schema._validate_blob_fields(fields, options, primary_keys)
Review Comment:
I agree that Python does not yet have a Java-style centralized validation
layer, but this PR is adding new invariants and new read/write behavior that
depends on them. Leaving the direct `Schema(...)` path unchecked means users
can commit invalid `blob-view-field` / `blob-descriptor-field` options today,
and the failure then moves to later writes/reads with much less clear errors.
This does not need a full validation refactor in this PR. A narrow fix would
be to call the same blob-field validation from the schema commit path (for
example in `SchemaManager.create_table` after the schema is materialized and
before `schema-0` is written), so both `Schema.from_pyarrow_schema(...)` and
direct `Schema(...)` creation enforce the same invariants.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]