priyankar-stripe opened a new issue, #14583:
URL: https://github.com/apache/iceberg/issues/14583

   ### Apache Iceberg version
   
   1.8.1
   
   ### Query engine
   
   Flink
   
   ### Please describe the bug 🐞
   
   In https://github.com/apache/iceberg/pull/10523/files, we changed the 
cleanup logic to stop fetching the latest snapshot from the metastore and 
instead maintain an in-memory snapshot instance for cleanup operations.
   Specifically what we saw happen was:
   1. Initial Commit Attempt: Flink attempts to commit snapshot `<snapshot_id>` 
to metastore. The commit succeeds on the metastore side, but Flink receives a 
transient network error and incorrectly marks the commit as failed.
    2. Retry with Stale Metadata: RetryingMetaStoreClient retries the commit, 
but since the table has already been modified, metastore returns a `The table 
has been modified` error. This triggers a `CommitFailedException` (see
     
https://github.com/apache/iceberg/blob/1.8.x/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L277-L278).
     3. SnapshotProducer Retry: SnapshotProducer catches this exception and 
retries the operation. It reuses the same snapshot ID but generates a new 
manifest list file: `snap-<snapshot_id>-2-<uuid>.avro` **(note the incremented 
attempt number)**, different from the already-committed manifest list 
`snap-<snapshot_id>-1-<uuid>.avro`.
     4. No-Op Detection: Since there are no actual changes between these two 
attempts (same snapshot content), Iceberg detects this as a no-op and skips the 
commit 
https://github.com/apache/iceberg/blob/1.8.x/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L448-L453.
     5. Incorrect Cleanup: The cleanup logic then runs, but it incorrectly 
assumes `snap-<snapshot_id>-2-<uuid>.avro` is the committed manifest list 
(since it's the most recent attempt). It therefore deletes 
`snap-<snapshot_id>-1-<uuid>.avro` as an "uncommitted" file, thereby corrupting 
the active snapshot 
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to