rraulinio opened a new issue, #1153:
URL: https://github.com/apache/iceberg-go/issues/1153

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   ## Problem
   
   Repeated overwrite commits can keep carrying old manifests that no longer 
contribute any live entries to the current snapshot.
   
   In `overwriteFiles.existingManifests`, existing manifest entries are read 
with deleted entries filtered out:
   
   ```go
   for entry, err := range of.base.iterManifestEntries(m, true) {
       ...
   }
   ```
   
   If a carried manifest only contains deleted entries, this leaves 
`notDeleted` empty and `foundDeletedCount == 0`. The current logic then appends 
the original manifest to `existingFiles`, so the delete-only manifest remains 
reachable from the new snapshot.
   
   ## Why This Matters
   
   For workloads that repeatedly replace the full table contents, every 
overwrite can create a delete-entry manifest for the previous files. Old 
delete-only manifests are then carried into later snapshots even though they do 
not affect current snapshot reads.
   
   Over time, the current snapshot's manifest list grows with stale delete-only 
manifests and normal orphan cleanup cannot remove those Avro files because they 
are still reachable.
   
   ## Expected Behavior
   
   After deleted entries are filtered out, a manifest with zero remaining live 
entries should not be carried forward into the next overwrite snapshot.
   
   This should not drop zero-count inherited manifests blindly. The check 
should be based on actually reading entries with `discardDeleted=true`, not on 
manifest count metadata, because some valid inherited manifests may have zero 
or unset counts but still contain live entries.
   
   ## Proposed Fix
   
   In the overwrite existing-manifest path, skip an existing manifest if 
`iterManifestEntries(m, true)` produces no live entries.
   
   Add a regression test with repeated `ReplaceDataFilesWithDataFiles` calls to 
verify the current snapshot keeps a bounded manifest list instead of 
accumulating old delete-only manifests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to