geruh commented on code in PR #2927:
URL: https://github.com/apache/iceberg-python/pull/2927#discussion_r2706510676


##########
pyiceberg/catalog/hive.py:
##########
@@ -551,11 +551,24 @@ def commit_table(
 
                 if hive_table and current_table:
                     # Table exists, update it.
-                    hive_table.parameters = _construct_parameters(
+                    new_parameters = _construct_parameters(
                         
metadata_location=updated_staged_table.metadata_location,
                         
previous_metadata_location=current_table.metadata_location,
                         metadata_properties=updated_staged_table.properties,
                     )
+
+                    # Detect properties that were removed from Iceberg metadata
+                    old_iceberg_keys = set(current_table.properties.keys())
+                    new_iceberg_keys = 
set(updated_staged_table.properties.keys())
+                    removed_keys = old_iceberg_keys - new_iceberg_keys
+
+                    # Merge HMS parameters: preserve HMS-only properties 
(e.g., table_category) while
+                    # ensuring Iceberg controls its metadata. Start with HMS 
params, remove deleted
+                    # Iceberg properties, then update with new Iceberg values 
(which take precedence).
+                    updated_parameters = {k: v for k, v in 
hive_table.parameters.items() if k not in removed_keys}

Review Comment:
   Not blocking as this solves the issue, but one suggestion would be to 
extract the merge logic into a small helper function like 
`_merge_hms_parameters()`. That way we could unit test without mocking the full 
commit flow, and document the semantics in one spot. 



##########
pyiceberg/catalog/hive.py:
##########
@@ -551,11 +551,24 @@ def commit_table(
 
                 if hive_table and current_table:
                     # Table exists, update it.
-                    hive_table.parameters = _construct_parameters(
+                    new_parameters = _construct_parameters(
                         
metadata_location=updated_staged_table.metadata_location,
                         
previous_metadata_location=current_table.metadata_location,
                         metadata_properties=updated_staged_table.properties,
                     )
+
+                    # Detect properties that were removed from Iceberg metadata
+                    old_iceberg_keys = set(current_table.properties.keys())
+                    new_iceberg_keys = 
set(updated_staged_table.properties.keys())
+                    removed_keys = old_iceberg_keys - new_iceberg_keys
+
+                    # Merge HMS parameters: preserve HMS-only properties 
(e.g., table_category) while
+                    # ensuring Iceberg controls its metadata. Start with HMS 
params, remove deleted
+                    # Iceberg properties, then update with new Iceberg values 
(which take precedence).
+                    updated_parameters = {k: v for k, v in 
hive_table.parameters.items() if k not in removed_keys}

Review Comment:
   nit: params could be None here. 
   
   
https://github.com/apache/iceberg/blob/369fe89138ca52ef9437fe657324c659889a8d88/hive-metastore/src/main/java/org/apache/iceberg/hive/HMSTablePropertyHelper.java#L79



##########
pyiceberg/catalog/hive.py:
##########
@@ -551,11 +551,24 @@ def commit_table(
 
                 if hive_table and current_table:
                     # Table exists, update it.
-                    hive_table.parameters = _construct_parameters(
+                    new_parameters = _construct_parameters(
                         
metadata_location=updated_staged_table.metadata_location,
                         
previous_metadata_location=current_table.metadata_location,
                         metadata_properties=updated_staged_table.properties,
                     )
+
+                    # Detect properties that were removed from Iceberg metadata
+                    old_iceberg_keys = set(current_table.properties.keys())

Review Comment:
   nit: can simplify to `current_table.properties.keys() - 
updated_staged_table.properties.keys()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to