rahulsmahadev opened a new pull request, #16753:
URL: https://github.com/apache/iceberg/pull/16753
## Summary
`TableMetadata.Builder#setRef` accepts a tag ref named `main`: the
main-branch special case sets `currentSnapshotId` and appends to the snapshot
log without checking the ref type, and `validateRefs` only verifies that main's
snapshot id matches `current-snapshot-id` — never its type. The spec requires
main to be a branch (format/spec.md, `refs`: "There is always a `main` branch
reference pointing to the `current-snapshot-id`").
Once such metadata is committed the table can no longer be written: every
subsequent commit fails in `setBranchSnapshotInternal` with `Cannot update
branch: main is a tag` — which also shows the builder already assumes main is a
branch elsewhere.
This is reachable on any table whose metadata has snapshots but no `main`
ref yet:
- `newAppend().stageOnly().commit()` (WAP) followed by
`manageSnapshots().createTag("main", stagedSnapshotId).commit()`
- after `CREATE OR REPLACE` (replacement metadata drops refs but keeps
snapshots)
- via REST `set-snapshot-ref` updates applied through
`TableMetadata.Builder` (update requirements assert ref snapshot ids, not ref
types)
This change rejects the tag in `Builder#setRef` with a `ValidationException`
before any builder state is mutated. The check is deliberately not added to
`validateRefs` so that already-written metadata files containing this
corruption can still be read (and repaired by committing a branch ref for main).
## Test plan
- `TestTableMetadata#testSetRefRejectsTagForMainBranch` — builder-level:
`setRef("main", tag)` throws; tag refs under any other name still work
- `TestSnapshotManager#testCreateTagNamedMainFails` — end-to-end repro via
`ManageSnapshots` on a staged-only table; also asserts the failed commit did
not create the main ref
- Ran `:iceberg-core:test --tests TestTableMetadata --tests
TestSnapshotManager` plus spotless/checkstyle locally
This pull request and its description were written by Isaac.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]