kevinjqliu commented on code in PR #2410:
URL: https://github.com/apache/iceberg-python/pull/2410#discussion_r2729448862


##########
pyiceberg/table/update/__init__.py:
##########
@@ -706,6 +706,12 @@ def update_table_metadata(
         if base_metadata.last_updated_ms == new_metadata.last_updated_ms:
             new_metadata = new_metadata.model_copy(update={"last_updated_ms": 
datetime_to_millis(datetime.now().astimezone())})
 
+    # Check correctness of partition spec, and sort order
+    new_metadata.spec().check_compatible(new_metadata.schema())
+
+    if sort_order := 
new_metadata.sort_order_by_id(new_metadata.default_sort_order_id):

Review Comment:
   nit: i think we can add a sort_order() function to TableMetadata class, i 
notice its only available in Table. but we can refactor later
   
   then we can do 
   ```
   new_metadata.sort_order().check_compatible(new_metadata.schema())
   ```



##########
pyiceberg/table/sorting.py:
##########
@@ -169,6 +170,16 @@ def __repr__(self) -> str:
         fields = f"{', '.join(repr(column) for column in self.fields)}, " if 
self.fields else ""
         return f"SortOrder({fields}order_id={self.order_id})"
 
+    def check_compatible(self, schema: Schema) -> None:
+        for field in self.fields:
+            source_field = schema._lazy_id_to_field.get(field.source_id)
+            if source_field is None:
+                raise ValidationError(f"Cannot find source column for sort 
field: {field}")
+            if not source_field.field_type.is_primitive:
+                raise ValidationError(f"Cannot sort by non-primitive source 
field: {source_field}")
+            if not field.transform.can_transform(source_field.field_type):
+                raise ValidationError(f"Invalid source type 
{source_field.field_type} for transform: {field.transform}")

Review Comment:
   nit: should we include `source_field` in the msg here? or do you think 
theres enough context already? 



##########
pyiceberg/partitioning.py:
##########
@@ -249,6 +250,37 @@ def partition_to_path(self, data: Record, schema: Schema) 
-> str:
         path = "/".join([field_str + "=" + value_str for field_str, value_str 
in zip(field_strs, value_strs, strict=True)])
         return path
 
+    def check_compatible(self, schema: Schema, allow_missing_fields: bool = 
False) -> None:

Review Comment:
   nit: doesnt look like `allow_missing_fields` is used anywhere, should we 
still keep it? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to