jeroko commented on code in PR #2662:
URL: https://github.com/apache/iceberg-python/pull/2662#discussion_r2467182058


##########
pyiceberg/io/pyarrow.py:
##########
@@ -2624,8 +2627,41 @@ def _check_pyarrow_schema_compatible(
         )
         additional_names = set(provided_schema._name_to_id.keys()) - 
set(requested_schema._name_to_id.keys())
         raise ValueError(
-            f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. Update the schema first (hint, use 
union_by_name)."
+            f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. "
+            "Update the schema first (hint, use union_by_name)."
         ) from e
+
+    # If the file has explicit field IDs, validate they match the table schema 
exactly
+    if has_field_ids:
+        # Build mappings for both schemas (including nested fields)
+        requested_id_to_name = requested_schema._lazy_id_to_name
+        provided_id_to_name = provided_schema._lazy_id_to_name

Review Comment:
   @Fokko Right, we should not care about the names if the IDs are provided, 
and the mapping between the IDs and the types was already checked in the call  
to `_check_schema_compatible` at the end of this function. In that case I 
didn't really need to add any extra check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to