Jeffail opened a new issue, #1029:
URL: https://github.com/apache/iceberg-go/issues/1029
### Apache Iceberg version
main (development)
### Please describe the bug 🐞
### Problem
`ManifestListWriter.AddManifests` (`manifest.go:1410-1414`) requires every
input manifest to match the writer's version exactly:
```go
case 2, 3:
for _, file := range files {
if file.Version() != m.version {
return fmt.Errorf("%w: ManifestListWriter only supports version
%d manifest files", ErrInvalidArgument, m.version)
}
...
```
The Iceberg spec, however, allows a v2 manifest list to reference v1
manifest files — the format version of an individual manifest is independent of
the manifest list's version, so that a v1 table can be upgraded to v2 without
rewriting historical manifests. Java's
[`ManifestListWriter`](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/ManifestListWriter.java)
supports this; `iceberg-go` does not.
### Symptom
For tables that originated as v1, any commit after the table metadata is
upgraded to v2 fails with:
```
adding files: invalid argument: ManifestListWriter only supports version 2
manifest files
```
The failure is permanent. Setting `format-version=2` on a v1 table is a
metadata-only change — existing v1 manifest files on disk are never rewritten
and continue to surface in `snapshotProducer.existingManifests()`
(`table/snapshot_producers.go:626-634`) on every subsequent commit. The
condition affects any consumer performing a v1→v2 upgrade, whether via
`Transaction.UpgradeFormatVersion` or via a metadata-only upgrade performed by
another engine (Spark, Hive) before `iceberg-go` writes to the table.
### Root cause
The strict version gate predates the spec's mixed-version inheritance rules.
The in-memory `manifestFile` produced from a v1 input — via
`manifestFileV1.toFile()` (`manifest.go:180-236`) or via `NewManifestFile(1,
...)` — already carries the inheritance values mandated by the spec for v1
manifests inside a v2+ list:
- `Content = ManifestContentData`
- `SeqNumber = initialSequenceNumber = 0`
- `MinSeqNumber = initialSequenceNumber = 0`
So it can be encoded directly against the v2 (or v3) entry schema. The only
thing standing in the way is the version check itself.
A v3 manifest inside a v2 list, however, is *not* representable: the v2
entry schema has no `first_row_id` column, so accepting it would silently drop
data. That direction must remain rejected.
### Fix
Relax the gate to `file.Version() > m.version`:
- v2 manifest list: accepts v1 and v2 manifests
- v3 manifest list: accepts v1, v2, and v3 manifests
- v2 list still rejects v3 manifests; v1 list still rejects v2/v3 manifests
No conversion code is required. The existing `manifestFile` representation
is already version-agnostic, and the writer's avro encoder picks the schema
from `m.version`, so v1 inputs are encoded against the v2/v3 schema with the
correct inherited values. The existing `first_row_id` assignment at
`manifest.go:1419-1425` (data manifests with `FirstRowIDValue == nil`) already
covers v1 manifests passed into v3 lists.
### Relation to existing issues
- #416 / #826 / #827: `format-version` `strconv.Atoi("")` fix at the
manifest list and manifest file layers — separate symptom from the same v1→v2
upgrade flow; both stages already resolved.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]