abnobdoss commented on code in PR #3320:
URL: https://github.com/apache/iceberg-python/pull/3320#discussion_r3331196147


##########
tests/integration/test_writes/test_optimistic_concurrency.py:
##########


Review Comment:
   Could we add coverage for operations that stage more than one snapshot? For 
example, `Transaction.delete()` can create both `_DeleteFiles` and 
`_OverwriteFiles` when one file is fully deleted and another is partially 
rewritten.
   
   For example:
   ```python
       def test_mixed_delete_overwrite_starts_from_catalog_snapshot(catalog: 
Catalog) -> None:
           """Mixed full-file and partial deletes should validate from the 
original table snapshot."""
           catalog.create_namespace("default")
           schema = _test_schema()
           table = catalog.create_table("default.mixed_delete_start_snapshot", 
schema=schema)
       
           import pyarrow as pa
           from pyiceberg.table.update.snapshot import _DeleteFiles, 
_OverwriteFiles
       
           table.append(pa.table({"x": [1, 2]}))
           table.append(pa.table({"x": [2, 3]}))
       
           base_snapshot_id = table.metadata.current_snapshot_id
       
           tx = Transaction(table, autocommit=False)
           tx.delete("x <= 2")
       
           assert len(tx._snapshot_producers) == 2
       
           delete_producer, overwrite_producer = tx._snapshot_producers
           assert isinstance(delete_producer, _DeleteFiles)
           assert isinstance(overwrite_producer, _OverwriteFiles)
       
           assert delete_producer._starting_snapshot_id == base_snapshot_id
           assert overwrite_producer._starting_snapshot_id == base_snapshot_id
   ```



##########
pyiceberg/table/update/snapshot.py:
##########
@@ -499,6 +548,62 @@ def files_affected(self) -> bool:
         """Indicate if any manifest-entries can be dropped."""
         return len(self._deleted_entries()) > 0
 
+    def _refresh_for_retry(self) -> None:
+        """Reset state for a retry attempt, clearing the cached delete 
computation."""
+        super()._refresh_for_retry()
+        # Clear @cached_property by removing it from the instance __dict__.
+        # _compute_deletes depends on _parent_snapshot_id which changes on 
retry.
+        if "_compute_deletes" in self.__dict__:
+            del self.__dict__["_compute_deletes"]
+
+    def _validate_concurrency(self) -> None:
+        """Validate that concurrent changes do not conflict with this delete.
+
+        Note: This method is intentionally duplicated in _OverwriteFiles 
rather than
+        extracted to the base class. While the logic is currently identical, 
Java Iceberg's
+        BaseOverwriteFiles and BaseRowDelta have divergent validation. Keeping 
them separate
+        makes it easier to add RowDelta-specific validation in the future.
+        """
+        from pyiceberg.table import TableProperties
+        from pyiceberg.table.snapshots import IsolationLevel
+        from pyiceberg.table.update.validate import (
+            _validate_added_data_files,
+            _validate_deleted_data_files,
+            _validate_no_new_delete_files,
+            _validate_no_new_deletes_for_data_files,
+        )
+
+        if self._parent_snapshot_id is None:
+            return
+
+        table = self._transaction._table
+        parent_snapshot = 
table.metadata.snapshot_by_id(self._parent_snapshot_id)
+        if parent_snapshot is None:

Review Comment:
   Could we make these short-circuits a bit more targeted?
   
   `self._parent_snapshot_id is None` seems like a valid empty-table/new-branch 
case, but if a non-null parent or starting snapshot id cannot be resolved, I’m 
not sure we should silently skip validation. Would it make sense to raise in 
those cases, or otherwise distinguish expected no-snapshot cases from 
unexpected missing snapshots?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to