viirya opened a new pull request, #2545:
URL: https://github.com/apache/iceberg-rust/pull/2545

   ## Which issue does this PR close?
   
   - Closes #2148.
   
   ## What changes are included in this PR?
   
   `FastAppendOperation::existing_manifest()` filtered the current snapshot's 
manifest list entries with:
   
   ```rust
   .filter(|entry| entry.has_added_files() || entry.has_existing_files())
   ```
   
   This drops any manifest that contains **only** `Deleted` entries 
(`deleted_files_count > 0` while `added_files_count == existing_files_count == 
0`). A delete-only manifest is not empty — it records which files were removed 
and, per the spec, must persist across snapshots until `expire_snapshots` 
cleans it up. Dropping it on the next `fast_append` means the old manifests 
still carry `Added` entries for those files but there is no longer a delete 
manifest to exclude them, so the removed files reappear as live data. Repeated 
append/rewrite cycles compound this into exponential row growth (see #2148 for 
the reproduction).
   
   The fix adds `|| entry.has_deleted_files()` to the filter so delete-only 
manifests are carried forward.
   
   Note: this is currently a **latent** bug. No operation on `main` produces 
delete-only manifests yet (`FastAppendOperation::delete_entries` returns 
empty), so it is not end-to-end triggerable today. It becomes immediately 
triggerable once an action that produces delete-only manifests (e.g. 
rewrite/overwrite) lands. Fixing it now prevents that regression and locks in 
the correct behavior.
   
   ## Are these changes tested?
   
   Yes — a new unit test 
`test_existing_manifest_preserves_delete_only_manifest` builds a table whose 
current snapshot's manifest list contains a data manifest plus a delete-only 
manifest, calls `existing_manifest()`, and asserts the delete-only manifest is 
carried forward. The test fails on the previous filter and passes with the fix.
   
   - `cargo test -p iceberg --lib transaction::`
   - `cargo test -p iceberg --lib scan`
   - `cargo fmt -p iceberg -- --check`
   - `cargo clippy -p iceberg --tests`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to