[GitHub] [iceberg] hameizi commented on pull request #2867: Flink: Auto compact file

GitBox Mon, 13 Sep 2021 19:21:48 -0700


hameizi commented on pull request #2867:
URL: https://github.com/apache/iceberg/pull/2867#issuecomment-918739609



   > I think the parallel commit proposal that @rdblue proposed could work
   
   In this PR the rewriteAction of flink is parallel, it will not make data 
deal slow down. Because when the function snapshot success flink will continue 
deal data but not wait the result of notifyCheckpointComplete.
   
   > I wonder what is the initial drive behind this implementation.
   
   Auto compact file every checkpoint in flink will solve several question. 
   1. It will make query iceberg table fastly every time, because in our sence 
we find query table slowly although we have schedule  compact file every day, 
but it is not enough.
   2. It will slove the bug of there is duplicate rows in iceberg primary table 
when we compact file https://github.com/apache/iceberg/issues/2308 . Because we 
strict commit one snashot and then compact file in order, so we will not cause 
there is one more snapshot is commit when we are compacting file. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] hameizi commented on pull request #2867: Flink: Auto compact file

Reply via email to