mxm commented on PR #14092: URL: https://github.com/apache/iceberg/pull/14092#issuecomment-3324776480
> @mxm How do you envision splitting these two points? The root problem is `DynamicWriteResultAggregator` producing multiple commit requests per table, branch, and checkpointId triplet. The `DynamicCommitter` is agnostic to this and commits upstream requests; it doesn't have any special handling for partition spec changes. I cleaned up `DynamicCommitter` to reflect a single commit expectation, which feels like it should be part of this change. I've left some thoughts in https://github.com/apache/iceberg/issues/14090. I also have an implementation which I'll share. Basically, the idea is to use as few snapshots as possible. We can combine append-only WriteResults into a single snapshot. Whenever delete files are present, we need multiple snapshots. To make this fault tolerant, we need to store an index into the list of write results and persist it as part of the snapshot summary, similarly to the Flink checkpoint id. We can then skip any previously applied WriteResults on recovery. As for multiple partition spec per snapshot, I couldn't find that this is not permitted in Iceberg. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
