mzzz-zzm opened a new issue, #976:
URL: https://github.com/apache/iceberg-go/issues/976
### Apache Iceberg version
main (development)
### Please describe the bug 🐞
When a `Transaction.Commit` is retried after a catalog conflict (HTTP 409 /
`ErrCommitFailed`), the retry re-submits the *original* `AddSnapshotUpdate`
that
was built before the first attempt. That snapshot's manifest list was written
against the **stale parent** (or an empty table), so it only points to the
retrying writer's own data files. Even though the catalog ultimately accepts
the
retried commit, the resulting snapshot ignores all files that were committed
by
concurrent writers between the first attempt and the retry. A subsequent
table
scan therefore returns fewer rows than expected.
## Affected component
`table/snapshot_producers.go` — `snapshotProducer.commit()`
`table/table.go` — `doCommit()` retry loop
## Steps to reproduce
The test is self-contained. Save the file below as
`table/bug_repro_manifest_list_test.go` in an unmodified checkout of `main`
and
run:
```
go test ./table/ -run TestBugRepro_ManifestListNotInheritedOnRetry -v
```
Expected output on unfixed upstream:
```
--- FAIL: TestBugRepro_ManifestListNotInheritedOnRetry
bug_repro_manifest_list_test.go:203:
Error: "[0x...]" should have 2 item(s), but has 1
Messages: expected 2 manifests (one per writer); got 1 — manifest
list not rebuilt on OCC retry
```
The full source of `table/bug_repro_manifest_list_test.go` (package
`table_test`,
no external helpers required):
```go
package table_test
import (
"context"
"fmt"
"path/filepath"
"sync"
"testing"
"github.com/apache/arrow-go/v18/arrow"
"github.com/apache/arrow-go/v18/arrow/array"
"github.com/apache/arrow-go/v18/arrow/memory"
"github.com/apache/iceberg-go"
iceio "github.com/apache/iceberg-go/io"
"github.com/apache/iceberg-go/table"
"github.com/stretchr/testify/require"
)
type manifestListRepro1Catalog struct {
mu sync.Mutex
current table.Metadata
location string
}
func newManifestListRepro1Catalog(meta table.Metadata, location string)
*manifestListRepro1Catalog {
return &manifestListRepro1Catalog{current: meta, location: location}
}
func (c *manifestListRepro1Catalog) LoadTable(_ context.Context, ident
table.Identifier) (*table.Table, error) {
c.mu.Lock()
meta := c.current
c.mu.Unlock()
return table.New(
ident, meta, c.location+"/metadata/v1.metadata.json",
func(_ context.Context) (iceio.IO, error) { return
iceio.LocalFS{}, nil },
c,
), nil
}
func (c *manifestListRepro1Catalog) CommitTable(
_ context.Context,
ident table.Identifier,
reqs []table.Requirement,
updates []table.Update,
) (table.Metadata, string, error) {
c.mu.Lock()
defer c.mu.Unlock()
for _, req := range reqs {
if err := req.Validate(c.current); err != nil {
return nil, "", fmt.Errorf("%w: CAS check failed: %v",
table.ErrCommitFailed, err)
}
}
newMeta, err := table.UpdateTableMetadata(c.current, updates, "")
if err != nil {
return nil, "", err
}
c.current = newMeta
return newMeta, c.location + "/metadata/committed.metadata.json", nil
}
func TestBugRepro_ManifestListNotInheritedOnRetry(t *testing.T) {
location := filepath.ToSlash(t.TempDir())
ctx := context.Background()
schema := iceberg.NewSchema(0,
iceberg.NestedField{ID: 1, Name: "id", Type:
iceberg.PrimitiveTypes.Int64},
iceberg.NestedField{ID: 2, Name: "val", Type:
iceberg.PrimitiveTypes.String},
)
// CommitNumRetriesKey must be >= 1: the bug lives in the retry path.
// The upstream default is 0, which would cause an immediate failure
// rather than a silent data-loss on retry.
props := iceberg.Properties{
table.PropertyFormatVersion: "2",
table.CommitNumRetriesKey: "1",
table.CommitMinRetryWaitMsKey: "0",
table.CommitMaxRetryWaitMsKey: "0",
table.CommitTotalRetryTimeoutMsKey: "60000",
}
metaEmpty, err := table.NewMetadata(
schema, iceberg.UnpartitionedSpec, table.UnsortedSortOrder,
location, props,
)
require.NoError(t, err)
cat := newManifestListRepro1Catalog(metaEmpty, location)
fsF := func(_ context.Context) (iceio.IO, error) { return
iceio.LocalFS{}, nil }
arrowSc := arrow.NewSchema([]arrow.Field{
{Name: "id", Type: arrow.PrimitiveTypes.Int64, Nullable: true},
{Name: "val", Type: arrow.BinaryTypes.String, Nullable: true},
}, nil)
makeRow := func() arrow.Table {
tbl, e := array.TableFromJSON(memory.DefaultAllocator, arrowSc,
[]string{`[{"id": 1, "val": "x"}]`})
require.NoError(t, e)
return tbl
}
newTbl := func(base table.Metadata) *table.Table {
return table.New([]string{"default", "repro1"}, base,
location+"/metadata/v1.metadata.json", fsF, cat)
}
// Writer B commits one row — catalog HEAD advances to Writer B's
snapshot.
rowB := makeRow()
defer rowB.Release()
txB := newTbl(metaEmpty).NewTransaction()
require.NoError(t, txB.AppendTable(ctx, rowB, 1, nil))
_, err = txB.Commit(ctx)
require.NoError(t, err, "Writer B must commit successfully")
// Writer A starts from the EMPTY base (stale).
// Attempt 1: CAS fails (catalog HEAD = Writer B's snapshot,
requirement = nil).
// Retry: reload catalog (post-B), rewrite AssertRefSnapshotID,
re-submit
// the SAME AddSnapshotUpdate with the stale manifest list
path.
rowA := makeRow()
defer rowA.Release()
txA := newTbl(metaEmpty).NewTransaction()
require.NoError(t, txA.AppendTable(ctx, rowA, 1, nil))
_, err = txA.Commit(ctx)
require.NoError(t, err, "Writer A must succeed after OCC retry")
cat.mu.Lock()
finalMeta := cat.current
cat.mu.Unlock()
finalSnap := finalMeta.CurrentSnapshot()
require.NotNil(t, finalSnap)
manifests, err := finalSnap.Manifests(iceio.LocalFS{})
require.NoError(t, err)
// BUG: manifest list not rebuilt — only 1 manifest (Writer A's).
// Writer B's data is absent from the final snapshot.
require.Len(t, manifests, 2,
"expected 2 manifests (one per writer); got %d — manifest list
not rebuilt on OCC retry",
len(manifests))
}
```
## Root cause
`snapshotProducer.commit()` writes the manifest list Avro file to object
storage
**once**, before the first `CommitTable` call. It uses `sp.parentSnapshotID`
(the
stale parent at transaction-creation time) to determine which existing
manifests
to inherit. The returned `AddSnapshotUpdate` carries a hard-coded
`ManifestList` path pointing to that file.
On `ErrCommitFailed`, `doCommit` (`table/table.go`) does correctly reload the
catalog's current metadata and rewrite the `AssertRefSnapshotID` requirement
to
point at the new branch head. However, the `updates` slice — which contains
the
`AddSnapshotUpdate` with the stale manifest list path — is **not** rebuilt.
The
retried commit therefore references a manifest list that was written against
the
pre-conflict parent and omits all manifests that concurrent writers committed
between attempt 0 and the retry.
## Expected behaviour
On each OCC retry the manifest list should be rebuilt against the **fresh**
parent snapshot (the one just loaded from the catalog) to include all
manifests
that concurrent writers committed between the first attempt and the retry.
The
retrying writer's own new manifests should then be appended on top.
Without this rebuild, the final snapshot silently omits data files written by
concurrent committers, causing row-level data loss that is invisible to the
writer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]