prodeezy opened a new issue, #4666: URL: https://github.com/apache/iceberg/issues/4666
We are seeing staged snapshots with aborted data files being committed to table active snapshots line. This leaves the table unreadable with scans failing with java.io.FileNotFoundException. If there is a fatal error in the system during the following portion of commit procedure Iceberg can end up promoting invalid snapshot to the active snapshots line: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L328-L346 **Sequence of events to reproduce:** T1: Create table with Write-Audit-Publish enabled T2: Write some data to it with a wap.id=B1 using Snapshot S1. During the commit execution there is a fatal error anywhere during this code snippet https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotProducer.java#L328-L346 T3: Above Commit failure triggers an abort and deletes data under S1. This leaves the Snapshot S1 in the staged snapshot list but hasn't been cherry-picked. T4: A different worker tries to write data with wap.id=B1 using S2. T5: After validation during this worker filters table.snapshots() to find S1 which has the same wap.id and cherrypicks this to add to table. T6: S1 gets published to active line. T7: Reading table fails with FNF since S1 has no data files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
