mzzz-zzm opened a new issue, #976:
URL: https://github.com/apache/iceberg-go/issues/976

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   When a `Transaction.Commit` is retried after a catalog conflict (HTTP 409 /
   `ErrCommitFailed`), the retry re-submits the *original* `AddSnapshotUpdate` 
that
   was built before the first attempt. That snapshot's manifest list was written
   against the **stale parent** (or an empty table), so it only points to the
   retrying writer's own data files. Even though the catalog ultimately accepts 
the
   retried commit, the resulting snapshot ignores all files that were committed 
by
   concurrent writers between the first attempt and the retry. A subsequent 
table
   scan therefore returns fewer rows than expected.
   
   ## Affected component
   
   `table/snapshot_producers.go` — `snapshotProducer.commit()`  
   `table/table.go` — `doCommit()` retry loop
   
   ## Steps to reproduce
   
   The test is self-contained. Save the file below as
   `table/bug_repro_manifest_list_test.go` in an unmodified checkout of `main` 
and
   run:
   
   ```
   go test ./table/ -run TestBugRepro_ManifestListNotInheritedOnRetry -v
   ```
   
   Expected output on unfixed upstream:
   
   ```
   --- FAIL: TestBugRepro_ManifestListNotInheritedOnRetry
       bug_repro_manifest_list_test.go:203:
           Error:  "[0x...]" should have 2 item(s), but has 1
           Messages: expected 2 manifests (one per writer); got 1 — manifest 
list not rebuilt on OCC retry
   ```
   
   The full source of `table/bug_repro_manifest_list_test.go` (package 
`table_test`,
   no external helpers required):
   
   ```go
   package table_test
   
   import (
        "context"
        "fmt"
        "path/filepath"
        "sync"
        "testing"
   
        "github.com/apache/arrow-go/v18/arrow"
        "github.com/apache/arrow-go/v18/arrow/array"
        "github.com/apache/arrow-go/v18/arrow/memory"
        "github.com/apache/iceberg-go"
        iceio "github.com/apache/iceberg-go/io"
        "github.com/apache/iceberg-go/table"
        "github.com/stretchr/testify/require"
   )
   
   type manifestListRepro1Catalog struct {
        mu       sync.Mutex
        current  table.Metadata
        location string
   }
   
   func newManifestListRepro1Catalog(meta table.Metadata, location string) 
*manifestListRepro1Catalog {
        return &manifestListRepro1Catalog{current: meta, location: location}
   }
   
   func (c *manifestListRepro1Catalog) LoadTable(_ context.Context, ident 
table.Identifier) (*table.Table, error) {
        c.mu.Lock()
        meta := c.current
        c.mu.Unlock()
        return table.New(
                ident, meta, c.location+"/metadata/v1.metadata.json",
                func(_ context.Context) (iceio.IO, error) { return 
iceio.LocalFS{}, nil },
                c,
        ), nil
   }
   
   func (c *manifestListRepro1Catalog) CommitTable(
        _ context.Context,
        ident table.Identifier,
        reqs []table.Requirement,
        updates []table.Update,
   ) (table.Metadata, string, error) {
        c.mu.Lock()
        defer c.mu.Unlock()
        for _, req := range reqs {
                if err := req.Validate(c.current); err != nil {
                        return nil, "", fmt.Errorf("%w: CAS check failed: %v", 
table.ErrCommitFailed, err)
                }
        }
        newMeta, err := table.UpdateTableMetadata(c.current, updates, "")
        if err != nil {
                return nil, "", err
        }
        c.current = newMeta
        return newMeta, c.location + "/metadata/committed.metadata.json", nil
   }
   
   func TestBugRepro_ManifestListNotInheritedOnRetry(t *testing.T) {
        location := filepath.ToSlash(t.TempDir())
        ctx := context.Background()
   
        schema := iceberg.NewSchema(0,
                iceberg.NestedField{ID: 1, Name: "id", Type: 
iceberg.PrimitiveTypes.Int64},
                iceberg.NestedField{ID: 2, Name: "val", Type: 
iceberg.PrimitiveTypes.String},
        )
        // CommitNumRetriesKey must be >= 1: the bug lives in the retry path.
        // The upstream default is 0, which would cause an immediate failure
        // rather than a silent data-loss on retry.
        props := iceberg.Properties{
                table.PropertyFormatVersion:        "2",
                table.CommitNumRetriesKey:          "1",
                table.CommitMinRetryWaitMsKey:      "0",
                table.CommitMaxRetryWaitMsKey:      "0",
                table.CommitTotalRetryTimeoutMsKey: "60000",
        }
   
        metaEmpty, err := table.NewMetadata(
                schema, iceberg.UnpartitionedSpec, table.UnsortedSortOrder, 
location, props,
        )
        require.NoError(t, err)
   
        cat := newManifestListRepro1Catalog(metaEmpty, location)
        fsF := func(_ context.Context) (iceio.IO, error) { return 
iceio.LocalFS{}, nil }
   
        arrowSc := arrow.NewSchema([]arrow.Field{
                {Name: "id", Type: arrow.PrimitiveTypes.Int64, Nullable: true},
                {Name: "val", Type: arrow.BinaryTypes.String, Nullable: true},
        }, nil)
        makeRow := func() arrow.Table {
                tbl, e := array.TableFromJSON(memory.DefaultAllocator, arrowSc,
                        []string{`[{"id": 1, "val": "x"}]`})
                require.NoError(t, e)
                return tbl
        }
        newTbl := func(base table.Metadata) *table.Table {
                return table.New([]string{"default", "repro1"}, base,
                        location+"/metadata/v1.metadata.json", fsF, cat)
        }
   
        // Writer B commits one row — catalog HEAD advances to Writer B's 
snapshot.
        rowB := makeRow()
        defer rowB.Release()
        txB := newTbl(metaEmpty).NewTransaction()
        require.NoError(t, txB.AppendTable(ctx, rowB, 1, nil))
        _, err = txB.Commit(ctx)
        require.NoError(t, err, "Writer B must commit successfully")
   
        // Writer A starts from the EMPTY base (stale).
        // Attempt 1: CAS fails (catalog HEAD = Writer B's snapshot, 
requirement = nil).
        // Retry:     reload catalog (post-B), rewrite AssertRefSnapshotID, 
re-submit
        //            the SAME AddSnapshotUpdate with the stale manifest list 
path.
        rowA := makeRow()
        defer rowA.Release()
        txA := newTbl(metaEmpty).NewTransaction()
        require.NoError(t, txA.AppendTable(ctx, rowA, 1, nil))
        _, err = txA.Commit(ctx)
        require.NoError(t, err, "Writer A must succeed after OCC retry")
   
        cat.mu.Lock()
        finalMeta := cat.current
        cat.mu.Unlock()
   
        finalSnap := finalMeta.CurrentSnapshot()
        require.NotNil(t, finalSnap)
   
        manifests, err := finalSnap.Manifests(iceio.LocalFS{})
        require.NoError(t, err)
   
        // BUG: manifest list not rebuilt — only 1 manifest (Writer A's).
        // Writer B's data is absent from the final snapshot.
        require.Len(t, manifests, 2,
                "expected 2 manifests (one per writer); got %d — manifest list 
not rebuilt on OCC retry",
                len(manifests))
   }
   ```
   
   ## Root cause
   
   `snapshotProducer.commit()` writes the manifest list Avro file to object 
storage
   **once**, before the first `CommitTable` call. It uses `sp.parentSnapshotID` 
(the
   stale parent at transaction-creation time) to determine which existing 
manifests
   to inherit. The returned `AddSnapshotUpdate` carries a hard-coded
   `ManifestList` path pointing to that file.
   
   On `ErrCommitFailed`, `doCommit` (`table/table.go`) does correctly reload the
   catalog's current metadata and rewrite the `AssertRefSnapshotID` requirement 
to
   point at the new branch head. However, the `updates` slice — which contains 
the
   `AddSnapshotUpdate` with the stale manifest list path — is **not** rebuilt. 
The
   retried commit therefore references a manifest list that was written against 
the
   pre-conflict parent and omits all manifests that concurrent writers committed
   between attempt 0 and the retry.
   
   ## Expected behaviour
   
   On each OCC retry the manifest list should be rebuilt against the **fresh**
   parent snapshot (the one just loaded from the catalog) to include all 
manifests
   that concurrent writers committed between the first attempt and the retry. 
The
   retrying writer's own new manifests should then be appended on top.
   
   Without this rebuild, the final snapshot silently omits data files written by
   concurrent committers, causing row-level data loss that is invisible to the
   writer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to