Re: [PR] Add commit retry and concurrency validation for writes [iceberg-python]

via GitHub Mon, 01 Jun 2026 12:59:14 -0700


rambleraptor commented on code in PR #3320:
URL: https://github.com/apache/iceberg-python/pull/3320#discussion_r3336736277



##########
mkdocs/docs/configuration.md:
##########
@@ -85,6 +85,26 @@ Iceberg tables support table properties to configure table 
behavior.
 
 <!-- prettier-ignore-end -->
 
+### Commit retry options
+
+When a concurrent commit is detected, PyIceberg automatically retries the 
operation with exponential backoff. If the retry detects a real data conflict 
(e.g. concurrent deletes on the same partition), it raises 
`ValidationException` instead of retrying.
+
+| Key                              | Options          | Default   | 
Description                                                        |
+| -------------------------------- | ---------------- | --------- | 
------------------------------------------------------------------ |
+| `commit.retry.num-retries`       | Integer          | 4         | Maximum 
number of retry attempts after a commit conflict            |
+| `commit.retry.min-wait-ms`      | Integer (ms)     | 100       | Minimum 
wait time before the first retry                            |

Review Comment:
   Where did you get these defaults from? (Fine to say you made them up!)



##########
pyiceberg/table/__init__.py:
##########
@@ -1036,17 +1073,78 @@ def commit_transaction(self) -> Table:
             The table with the updates applied.
         """
         if len(self._updates) > 0:
-            self._requirements += 
(AssertTableUUID(uuid=self.table_metadata.table_uuid),)
-            self._table._do_commit(  # pylint: disable=W0212
-                updates=self._updates,
-                requirements=self._requirements,
+            from pyiceberg.utils.properties import property_as_int

Review Comment:
   Is there any reason we can't import this at the top?



##########
pyiceberg/table/update/snapshot.py:
##########
@@ -499,6 +549,62 @@ def files_affected(self) -> bool:
         """Indicate if any manifest-entries can be dropped."""
         return len(self._deleted_entries()) > 0
 
+    def _refresh_for_retry(self) -> None:
+        """Reset state for a retry attempt, clearing the cached delete 
computation."""
+        super()._refresh_for_retry()
+        # Clear @cached_property by removing it from the instance __dict__.
+        # _compute_deletes depends on _parent_snapshot_id which changes on 
retry.
+        if "_compute_deletes" in self.__dict__:
+            del self.__dict__["_compute_deletes"]
+
+    def _validate_concurrency(self) -> None:
+        """Validate that concurrent changes do not conflict with this delete.
+
+        Note: This method is intentionally duplicated in _OverwriteFiles 
rather than
+        extracted to the base class. While the logic is currently identical, 
Java Iceberg's
+        BaseOverwriteFiles and BaseRowDelta have divergent validation. Keeping 
them separate
+        makes it easier to add RowDelta-specific validation in the future.
+        """
+        from pyiceberg.table import TableProperties
+        from pyiceberg.table.snapshots import IsolationLevel
+        from pyiceberg.table.update.validate import (
+            _validate_added_data_files,
+            _validate_deleted_data_files,
+            _validate_no_new_delete_files,
+            _validate_no_new_deletes_for_data_files,
+        )
+
+        if self._parent_snapshot_id is None:
+            return
+
+        table = self._transaction._table
+        parent_snapshot = 
table.metadata.snapshot_by_id(self._parent_snapshot_id)

Review Comment:
   Can we make a helper function? Something like `get_parent_snapshot`?
   
   In general, there's a lot of tricky logic here and I'm looking for places to 
help us make it easier to understand. 



##########
pyiceberg/table/update/snapshot.py:
##########
@@ -499,6 +549,62 @@ def files_affected(self) -> bool:
         """Indicate if any manifest-entries can be dropped."""
         return len(self._deleted_entries()) > 0
 
+    def _refresh_for_retry(self) -> None:
+        """Reset state for a retry attempt, clearing the cached delete 
computation."""
+        super()._refresh_for_retry()
+        # Clear @cached_property by removing it from the instance __dict__.
+        # _compute_deletes depends on _parent_snapshot_id which changes on 
retry.
+        if "_compute_deletes" in self.__dict__:
+            del self.__dict__["_compute_deletes"]
+
+    def _validate_concurrency(self) -> None:
+        """Validate that concurrent changes do not conflict with this delete.
+
+        Note: This method is intentionally duplicated in _OverwriteFiles 
rather than
+        extracted to the base class. While the logic is currently identical, 
Java Iceberg's
+        BaseOverwriteFiles and BaseRowDelta have divergent validation. Keeping 
them separate
+        makes it easier to add RowDelta-specific validation in the future.

Review Comment:
   Alright, I'm reading through this comment now. Couple thoughts:
   
   1. If Java's implementation has diverged, why aren't we diverging?
   2. If these methods are the same, we need to just have them be the same. If 
they diverge in the future, then we can separate them out.



##########
pyiceberg/table/update/snapshot.py:
##########
@@ -499,6 +549,62 @@ def files_affected(self) -> bool:
         """Indicate if any manifest-entries can be dropped."""
         return len(self._deleted_entries()) > 0
 
+    def _refresh_for_retry(self) -> None:
+        """Reset state for a retry attempt, clearing the cached delete 
computation."""
+        super()._refresh_for_retry()
+        # Clear @cached_property by removing it from the instance __dict__.
+        # _compute_deletes depends on _parent_snapshot_id which changes on 
retry.
+        if "_compute_deletes" in self.__dict__:
+            del self.__dict__["_compute_deletes"]
+
+    def _validate_concurrency(self) -> None:
+        """Validate that concurrent changes do not conflict with this delete.
+
+        Note: This method is intentionally duplicated in _OverwriteFiles 
rather than
+        extracted to the base class. While the logic is currently identical, 
Java Iceberg's
+        BaseOverwriteFiles and BaseRowDelta have divergent validation. Keeping 
them separate
+        makes it easier to add RowDelta-specific validation in the future.
+        """
+        from pyiceberg.table import TableProperties
+        from pyiceberg.table.snapshots import IsolationLevel
+        from pyiceberg.table.update.validate import (
+            _validate_added_data_files,
+            _validate_deleted_data_files,
+            _validate_no_new_delete_files,
+            _validate_no_new_deletes_for_data_files,
+        )
+
+        if self._parent_snapshot_id is None:
+            return
+
+        table = self._transaction._table
+        parent_snapshot = 
table.metadata.snapshot_by_id(self._parent_snapshot_id)
+        if parent_snapshot is None:
+            raise ValidationException(f"Cannot find parent snapshot 
{self._parent_snapshot_id} in table metadata")
+
+        starting_snapshot_id = self._starting_snapshot_id if 
self._starting_snapshot_id is not None else self._parent_snapshot_id

Review Comment:
   Same thing - a helper function like `fetch_starting_snapshot`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add commit retry and concurrency validation for writes [iceberg-python]

Reply via email to