tomlarkworthy opened a new pull request, #2773:
URL: https://github.com/apache/iceberg-python/pull/2773
I've successfully created a proof-of-concept demonstrating that PyIceberg
already supports writing equality delete files via transactions, even though
the read path is not yet implemented.
What I Discovered
1. No tests use actual equality_ids values - All existing tests either set
it to [] or None
2. The write infrastructure is complete and working - All necessary
components exist:
- DataFileContent.EQUALITY_DELETES enum
- equality_ids field in DataFile
- Snapshot tracking for equality deletes
- Manifest serialization
3. The key is using the transaction API directly:
with table.transaction() as txn:
update_snapshot = txn.update_snapshot()
with update_snapshot.fast_append() as append_files:
append_files.append_data_file(delete_file) # Works for delete
files!
Files Created
1. test_equality_delete_poc.py - Detailed standalone test with verbose
output
2. test_add_equality_delete.py - Clean pytest suite with 2 passing tests:
- Single equality delete file
- Multiple delete files with different equality_ids
3. example_add_equality_delete.py - Complete working examples showing:
- Basic usage (single column)
- Composite keys (multiple columns)
- Multiple delete files in one transaction
4. EQUALITY_DELETE_POC_SUMMARY.md - Comprehensive documentation
Test Results
All tests pass successfully:
test_add_equality_delete.py::test_add_equality_delete_file_via_transaction
PASSED
test_add_equality_delete.py::test_add_multiple_equality_delete_files_with_different_equality_ids
PASSED
====== 2 passed in 1.06s ======
Key Takeaways
- ✅ You can write equality delete files today using the transaction API
- ✅ Single column deletes: equality_ids=[1]
- ✅ Composite key deletes: equality_ids=[1, 2]
- ✅ Multiple delete files can be added in one transaction
- ✅ Metadata tracking works correctly (snapshot summaries, manifests)
- ❌ Reading is blocked - raises ValueError when scanning tables with
equality deletes
The write path is production-ready. Users who generate equality delete
files externally can add them to PyIceberg tables now, though they'll need
other tools (like Spark) to read those tables.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]