[GitHub] [iceberg] rdblue commented on issue #2900: Flink CDC job getting failed due to G1 old gc and large checkpointing time

GitBox Tue, 10 Aug 2021 08:52:05 -0700


rdblue commented on issue #2900:
URL: https://github.com/apache/iceberg/issues/2900#issuecomment-896121910



   @ayush-san, the problem is that expiring snapshots dropped snapshots that 
were needed to reconstruct the history of your table between the last 
checkpoint and the current state. Because Flink can't determine what changed 
between those two, it throws an error. There isn't anything wrong with the 
table or the snapshot expiration, it is just that you expired snapshots that 
you ended up needing to use.
   
   To avoid the problem, I recommend keeping snapshots around for a longer 
period of time so that you don't expire snapshots that haven't been processed 
by your job yet.
   
   It looks like you'd also benefit from compaction in the streaming job, which 
we've been discussing elsewhere. There have been a few ideas around this, but 
the one I like the best is a second set of writers that write larger files 
across checkpoints once the content has already been committed. When those 
files get large enough, they swap the small files for a large one.
   
   Getting back to the purpose of this issue, it looks like the problem was 
probably that there was too much metadata for each table. Regular maintenance 
is probably the right fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on issue #2900: Flink CDC job getting failed due to G1 old gc and large checkpointing time

Reply via email to