Reo-LEI commented on pull request #3323: URL: https://github.com/apache/iceberg/pull/3323#issuecomment-1025491259
I finish the reactor of this PR according the [comment ](https://github.com/apache/iceberg/pull/3323#issuecomment-962068331) of @rdblue and have been updated the description of this PR. Briefly explain the change, I add the `IcebergRewriteTaskEmitter` to collect all data files and delete files which are committed by the flink job and collect all eq-delete files which are committed by other writers. Emit `CombinedScanTask` to `IcebergStreamRewriter` to make same partition rewrite file group can be rewritten in paralle. Group rewrite results by starting snapshot id and partition and commit them in batch. I test this in our prod env and has been running for a while, and that is work for v1 and v2 table. And for the time-based partitioned v1 table, we can use streaming rewrite to replace the batct rewrite action. @rdblue @jackye1995 @openinx @stevenzwu @kbendick Could you please take another look when you free? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
