nssalian commented on code in PR #2927:
URL: https://github.com/apache/iceberg-python/pull/2927#discussion_r2706442030


##########
pyiceberg/catalog/hive.py:
##########
@@ -551,11 +551,24 @@ def commit_table(
 
                 if hive_table and current_table:
                     # Table exists, update it.
-                    hive_table.parameters = _construct_parameters(
+                    new_parameters = _construct_parameters(
                         
metadata_location=updated_staged_table.metadata_location,
                         
previous_metadata_location=current_table.metadata_location,
                         metadata_properties=updated_staged_table.properties,
                     )
+
+                    # Detect properties that were removed from Iceberg metadata
+                    old_iceberg_keys = set(current_table.properties.keys())
+                    new_iceberg_keys = 
set(updated_staged_table.properties.keys())
+                    removed_keys = old_iceberg_keys - new_iceberg_keys
+
+                    # Merge HMS parameters: preserve HMS-only properties 
(e.g., table_category) while
+                    # ensuring Iceberg controls its metadata. Start with HMS 
params, remove deleted
+                    # Iceberg properties, then update with new Iceberg values 
(which take precedence).
+                    updated_parameters = {k: v for k, v in 
hive_table.parameters.items() if k not in removed_keys}

Review Comment:
   While this looks fine, I think it would be cleaner if we started with the 
existing, removed and then updated to have the final params. Something like 
this:
   ```
   merged_params = dict(hive_table.parameters)
   for key in removed_keys:
       merged_params.pop(key, None)
   merged_params.update(new_parameters)
   hive_table.parameters = merged_params
   ```



##########
tests/integration/test_reads.py:
##########
@@ -135,6 +135,118 @@ def test_hive_properties(catalog: Catalog) -> None:
         assert hive_table.parameters.get("abc") is None
 
 
[email protected]
[email protected]("catalog", 
[pytest.lazy_fixture("session_catalog_hive")])
+def test_hive_preserves_hms_specific_properties(catalog: Catalog) -> None:
+    """Test that HMS-specific table properties are preserved during table 
commits.
+
+    This verifies that HMS-specific properties that are not managed by Iceberg
+    are preserved during commits, rather than being lost.
+
+    Regression test for: https://github.com/apache/iceberg-python/issues/2926
+    """
+    table = create_table(catalog)
+    hive_client: _HiveClient = _HiveClient(catalog.properties["uri"])
+    with hive_client as open_client:
+        hive_table = open_client.get_table(*TABLE_NAME)
+        # Add HMS-specific properties that aren't managed by Iceberg
+        hive_table.parameters["table_category"] = "production"
+        hive_table.parameters["data_owner"] = "data_team"
+        open_client.alter_table(TABLE_NAME[0], TABLE_NAME[1], hive_table)
+
+    with hive_client as open_client:
+        hive_table = open_client.get_table(*TABLE_NAME)
+        assert hive_table.parameters.get("table_category") == "production"
+        assert hive_table.parameters.get("data_owner") == "data_team"
+
+    table.transaction().set_properties({"iceberg_property": 
"new_value"}).commit_transaction()
+
+    # Verify that HMS-specific properties are STILL present after commit
+    with hive_client as open_client:
+        hive_table = open_client.get_table(*TABLE_NAME)
+        # HMS-specific properties should be preserved
+        assert hive_table.parameters.get("table_category") == "production", (
+            "HMS property 'table_category' was lost during commit!"
+        )
+        assert hive_table.parameters.get("data_owner") == "data_team", "HMS 
property 'data_owner' was lost during commit!"
+        # Iceberg properties should also be present
+        assert hive_table.parameters.get("iceberg_property") == "new_value"
+
+
[email protected]
[email protected]("catalog", 
[pytest.lazy_fixture("session_catalog_hive")])
+def test_hive_iceberg_property_takes_precedence(catalog: Catalog) -> None:

Review Comment:
   I think this should be named better. It's more like Iceberg's metadata is 
the source of truth. CC: @kevinjqliu  to see what the right expected behavior 
is.



##########
tests/integration/test_reads.py:
##########
@@ -135,6 +135,118 @@ def test_hive_properties(catalog: Catalog) -> None:
         assert hive_table.parameters.get("abc") is None
 
 
[email protected]

Review Comment:
   Worth adding a test for property deletion to ensure it's not picked up from 
the old state.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to