ForeverAngry opened a new pull request, #2431:
URL: https://github.com/apache/iceberg-python/pull/2431

   <!--
   Thanks for opening a pull request!
   -->
   
   <!-- In the case this PR will resolve an issue, please replace 
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
   Related to #2409
   
   # Rationale for this change
   
   This PR fixes a thread safety issue in the `ManageSnapshots` class similar 
to the one identified in `ExpireSnapshots` (#2409). While the original issue 
specifically mentioned `ExpireSnapshots`, the same thread safety vulnerability 
exists in `ManageSnapshots` due to identical problematic design patterns.  The 
same testing methodology used in #2430 was adapted for this. 
   
   **Root Cause:** 
   The `ManageSnapshots` class had class-level attributes (`_updates`, 
`_requirements`) that were shared across all instances. When multiple threads 
created different `ManageSnapshots` instances for concurrent operations, they 
all shared the same underlying tuple objects for tracking updates and 
requirements.
   
   **Potential Impact:**
   - Thread 1: `table1.manage_snapshots().create_tag(...)` adds updates to 
shared tuple
   - Thread 2: `table2.manage_snapshots().create_branch(...)` adds updates to 
same shared tuple  
   - Result: Both threads would have mixed updates, potentially causing 
incorrect operations or failures
   
   **Solution:**
   Applied the same fix as ExpireSnapshots - moved the shared class-level 
attributes to instance-level attributes in the `__init__` method, ensuring each 
`ManageSnapshots` instance has its own isolated state.
   
   **Relationship to #2409:**
   While #2409 specifically reported ExpireSnapshots thread safety issues, this 
PR proactively addresses the same vulnerability pattern in ManageSnapshots to 
prevent similar issues from occurring with snapshot management operations 
(tags, branches, etc.).
   
   ## Are these changes tested?
   
   > 📢 🔥 Big shout-out to @QlikFrederic, as the testing methodology was largely 
derived from the testing and analysis done by the user! 🔥 📢 
   
   Yes, comprehensive test coverage has been added with a dedicated test file 
`test_manage_snapshots_thread_safety.py`:
   
   - **`test_manage_snapshots_thread_safety_fix()`** - Verifies that different 
ManageSnapshots instances have separate update/requirement tuples
   - **`test_manage_snapshots_concurrent_operations()`** - Tests concurrent 
operations don't contaminate each other  
   - **`test_manage_snapshots_concurrent_different_tables()`** - Tests 
concurrent operations on different tables work correctly
   - **`test_manage_snapshots_cross_table_isolation()`** - Validates no 
cross-contamination of operations between tables
   - **`test_manage_snapshots_concurrent_same_table_different_operations()`** - 
Tests concurrent operations on the same table
   
   All tests demonstrate that the thread safety fix works correctly and that 
concurrent ManageSnapshots operations maintain proper isolation.
   
   ## Are there any user-facing changes?
   
   **No breaking changes.** The public API remains identical:
   - All existing `ManageSnapshots` methods work the same way (`create_tag`, 
`create_branch`, `delete_tag`, etc.)
   - Method signatures are unchanged
   - Behavior is identical except for the thread safety improvement
   
   **Behavioral improvement:**
   - Concurrent `manage_snapshots()` operations on different tables now work 
correctly without interference
   - No risk of mixed updates/requirements between different ManageSnapshots 
instances in multi-threaded environments
   - Improved reliability for applications using ManageSnapshots in concurrent 
scenarios
   
   This is a pure bug fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to