j1wonpark opened a new pull request, #4231:
URL: https://github.com/apache/amoro/pull/4231
## Why are the changes needed?
When AMS expires snapshots on a single-ref Iceberg table, Iceberg
auto-selects
`IncrementalFileCleanup`. That strategy walks the current snapshot's ancestor
chain via `SnapshotUtil.ancestorIds`, and **if a parent snapshot is missing
from table metadata the walk terminates silently**. Snapshots below the break
are then treated as non-ancestors, so the `ADDED` entries of the manifests
they
wrote get reverted during cleanup. When such a manifest was superseded by a
merge but the data file it added is still carried over (as `EXISTING`) in a
manifest the current snapshot references, the cleanup deletes a data file
that
the live table still points at. Reads of the affected partitions then fail
with
`NotFoundException`.
We hit this in production: a tag pinned a snapshot off the main line, normal
expiration removed the snapshots that linked it back to the head (safe at
that
point because `refs >= 2` forced `ReachableFileCleanup`), and once the tag
was
dropped the table fell back to a single ref. The next expiration ran
`IncrementalFileCleanup`, its ancestor walk truncated at the gap, and data
files
still referenced by the current snapshot were physically deleted.
### Mechanism
```mermaid
flowchart TB
HEAD["current snapshot (head)"]
S_b["S_b"]
S_a["S_a (off main line)"]
M_merged["M_merged : F = EXISTING"]
M_a["M_a : F = ADDED"]
F["data file F"]
CUT["walk truncates - S_a unreachable"]
DEL["F deleted"]
LOSS["NotFoundException"]
HEAD --> S_b
S_b -.->|"parent missing, walk stops"| CUT
HEAD -->|"manifest list"| M_merged
S_a -->|"manifest list"| M_a
M_merged --> F
M_a --> F
S_a ==>|"treated as expirable, M_a reverted"| DEL
F -.-> DEL
DEL -.->|"M_merged still references F"| LOSS
```
The same data file `F` is referenced through two manifests — `M_a` (`ADDED`,
written by the off-line snapshot) and `M_merged` (`EXISTING`, carried over
and
still referenced by the head). When the walk drops `S_a`, `M_a` becomes a
revert
target, and cleanup deletes `F` without checking that `M_merged` keeps it
alive.
### Upstream status
This is the same root cause as Iceberg issue
[apache/iceberg#13568](https://github.com/apache/iceberg/issues/13568),
fixed by [apache/iceberg#13614](https://github.com/apache/iceberg/pull/13614)
(milestone **1.10.0**). That fix hardens the strategy selector in
`RemoveSnapshots` — `validateCleanupCanBeIncremental()` only allows
incremental
cleanup when no snapshots are explicitly removed, all removed snapshots are
ancestors of main (`hasRemovedNonMainAncestors`), and no non-main snapshots
exist before or after expiration (`hasNonMainSnapshots`) — otherwise it falls
back to `ReachableFileCleanup`.
Iceberg `1.6.x ~ 1.9.x` lack that guard: for a single ref they
unconditionally
pick the incremental strategy and are exposed. Bumping Iceberg core is not
always an option for existing deployments, so this PR backports an equivalent
guard at the Amoro layer.
## Brief change log
- `IcebergTableMaintainer`: before committing `expireSnapshots`, check
`hasSnapshotsOutsideMainAncestry()` — which walks the same
`SnapshotUtil.ancestorIds` chain the cleanup uses — and force
`ReachableFileCleanup` only when the table is in the at-risk state. Healthy
tables keep Iceberg's automatic (incremental) selection, so there is no
cost
impact.
- `ReachableFileCleanupBridge` (new, in package `org.apache.iceberg`):
`RemoveSnapshots#withIncrementalCleanup(boolean)` is package-private, so
the
bridge calls it without reflection. Compile-time binding means a future
signature change breaks the build instead of silently failing.
`ReachableFileCleanup` does not rely on the ancestor walk; it deletes only
the
complement of the file set referenced by live snapshots, so it is immune to
the
walk truncation regardless of ref count.
## How was this patch tested?
- [x] Added a regression test (`TestExpireSnapshotsKeepReferencedFiles`) that
reproduces the incident: it builds the carry-over + broken-chain state and
asserts that data files still referenced by the current snapshot survive
expiration.
- [ ] Add screenshots for manual tests if appropriate
- [x] Run test locally before making a pull request
## Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]