Fokko commented on code in PR #5686:
URL: https://github.com/apache/iceberg/pull/5686#discussion_r963099531


##########
python/pyiceberg/table/metadata.py:
##########
@@ -203,24 +210,23 @@ class TableMetadataV1(TableMetadataCommonFields, 
IcebergBaseModel):
     # because bumping the version should be an explicit operation that is up
     # to the owner of the table.
 
-    @root_validator(pre=True)
+    @root_validator
     def set_v2_compatible_defaults(cls, data: Dict[str, Any]) -> Dict[str, 
Any]:
         """Sets default values to be compatible with the format v2
 
-        Set some sensible defaults for V1, so we comply with the schema
-        this is in pre=True, meaning that this will be done before validation.
-        We don't want to make the fields optional, since they are required for 
V2
-
         Args:
             data: The raw arguments when initializing a V1 TableMetadata
 
         Returns:
             The TableMetadata with the defaults applied
         """
-        if data.get("schema") and "schema-id" not in data["schema"]:
-            data["schema"]["schema-id"] = DEFAULT_SCHEMA_ID
-        if data.get("partition-spec") and "last-partition-id" not in data:
-            data["last-partition-id"] = max(spec["field-id"] for spec in 
data["partition-spec"])
+        # When the schema doesn't have an ID
+        if data.get("schema") and "schema_id" not in data["schema"]:
+            data["schema"]["schema_id"] = DEFAULT_SCHEMA_ID
+
+        if data.get("partition_spec") and "last_partition_id" not in data:
+            data["last_partition_id"] = max(spec["field-id"] for spec in 
data["partition_spec"])

Review Comment:
   Thanks for asking this, and this is indeed a bug.. And also one of the 
reasons I don't like the pre-validators. The whole logic doesn't make sense 
because the `field-id` is on the fields of the spec.
   
   It didn't catch anything because `last_partition_id` is always in `data`. 
When running with pre-validation it could be empty because the defaults weren't 
applied.
   
   I've moved this to `construct_partition_specs` and threw in some walrus :=
   ```python
   if partition_specs := data.get(PARTITION_SPECS):
       data["last_partition_id"] = max(spec.last_assigned_field_id for spec in 
partition_specs)
   ```
   This way we set it every time we construct a table metadata.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to