mxm commented on PR #14092:
URL: https://github.com/apache/iceberg/pull/14092#issuecomment-3324776480

   > @mxm How do you envision splitting these two points? The root problem is 
`DynamicWriteResultAggregator` producing multiple commit requests per table, 
branch, and checkpointId triplet. The `DynamicCommitter` is agnostic to this 
and commits upstream requests; it doesn't have any special handling for 
partition spec changes. I cleaned up `DynamicCommitter` to reflect a single 
commit expectation, which feels like it should be part of this change.
   
   I've left some thoughts in https://github.com/apache/iceberg/issues/14090. I 
also have an implementation which I'll share. Basically, the idea is to use as 
few snapshots as possible. We can combine append-only WriteResults into a 
single snapshot. Whenever delete files are present, we need multiple snapshots. 
To make this fault tolerant, we need to store an index into the list of write 
results and persist it as part of the snapshot summary, similarly to the Flink 
checkpoint id. We can then skip any previously applied WriteResults on recovery.
   
   As for multiple partition spec per snapshot, I couldn't find that this is 
not permitted in Iceberg.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to