rraulinio opened a new issue, #1153:
URL: https://github.com/apache/iceberg-go/issues/1153
### Apache Iceberg version
main (development)
### Please describe the bug 🐞
## Problem
Repeated overwrite commits can keep carrying old manifests that no longer
contribute any live entries to the current snapshot.
In `overwriteFiles.existingManifests`, existing manifest entries are read
with deleted entries filtered out:
```go
for entry, err := range of.base.iterManifestEntries(m, true) {
...
}
```
If a carried manifest only contains deleted entries, this leaves
`notDeleted` empty and `foundDeletedCount == 0`. The current logic then appends
the original manifest to `existingFiles`, so the delete-only manifest remains
reachable from the new snapshot.
## Why This Matters
For workloads that repeatedly replace the full table contents, every
overwrite can create a delete-entry manifest for the previous files. Old
delete-only manifests are then carried into later snapshots even though they do
not affect current snapshot reads.
Over time, the current snapshot's manifest list grows with stale delete-only
manifests and normal orphan cleanup cannot remove those Avro files because they
are still reachable.
## Expected Behavior
After deleted entries are filtered out, a manifest with zero remaining live
entries should not be carried forward into the next overwrite snapshot.
This should not drop zero-count inherited manifests blindly. The check
should be based on actually reading entries with `discardDeleted=true`, not on
manifest count metadata, because some valid inherited manifests may have zero
or unset counts but still contain live entries.
## Proposed Fix
In the overwrite existing-manifest path, skip an existing manifest if
`iterManifestEntries(m, true)` produces no live entries.
Add a regression test with repeated `ReplaceDataFilesWithDataFiles` calls to
verify the current snapshot keeps a bounded manifest list instead of
accumulating old delete-only manifests.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]