Reo-LEI commented on pull request #3323:
URL: https://github.com/apache/iceberg/pull/3323#issuecomment-1025491259


   I finish the reactor of this PR according the [comment 
](https://github.com/apache/iceberg/pull/3323#issuecomment-962068331) of 
@rdblue and have been updated the description of this PR. 
   
   Briefly explain the change, I add the `IcebergRewriteTaskEmitter` to collect 
all data files and delete files which are committed by the flink job and 
collect all eq-delete files which are committed by other writers. Emit 
`CombinedScanTask` to `IcebergStreamRewriter` to make same partition rewrite 
file group can be rewritten in paralle. Group rewrite results by starting 
snapshot id and partition and commit them in batch.
   
   I test this in our prod env and has been running for a while, and that is 
work for v1 and v2 table. And for the time-based partitioned v1 table, we can 
use streaming rewrite to replace the batct rewrite action.
   
   @rdblue @jackye1995 @openinx @stevenzwu @kbendick  Could you please take 
another look when you free?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to