viirya opened a new pull request, #2545: URL: https://github.com/apache/iceberg-rust/pull/2545
## Which issue does this PR close? - Closes #2148. ## What changes are included in this PR? `FastAppendOperation::existing_manifest()` filtered the current snapshot's manifest list entries with: ```rust .filter(|entry| entry.has_added_files() || entry.has_existing_files()) ``` This drops any manifest that contains **only** `Deleted` entries (`deleted_files_count > 0` while `added_files_count == existing_files_count == 0`). A delete-only manifest is not empty — it records which files were removed and, per the spec, must persist across snapshots until `expire_snapshots` cleans it up. Dropping it on the next `fast_append` means the old manifests still carry `Added` entries for those files but there is no longer a delete manifest to exclude them, so the removed files reappear as live data. Repeated append/rewrite cycles compound this into exponential row growth (see #2148 for the reproduction). The fix adds `|| entry.has_deleted_files()` to the filter so delete-only manifests are carried forward. Note: this is currently a **latent** bug. No operation on `main` produces delete-only manifests yet (`FastAppendOperation::delete_entries` returns empty), so it is not end-to-end triggerable today. It becomes immediately triggerable once an action that produces delete-only manifests (e.g. rewrite/overwrite) lands. Fixing it now prevents that regression and locks in the correct behavior. ## Are these changes tested? Yes — a new unit test `test_existing_manifest_preserves_delete_only_manifest` builds a table whose current snapshot's manifest list contains a data manifest plus a delete-only manifest, calls `existing_manifest()`, and asserts the delete-only manifest is carried forward. The test fails on the previous filter and passes with the fix. - `cargo test -p iceberg --lib transaction::` - `cargo test -p iceberg --lib scan` - `cargo fmt -p iceberg -- --check` - `cargo clippy -p iceberg --tests` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
