carp84 commented on pull request #8751:
URL: https://github.com/apache/flink/pull/8751#issuecomment-804291991


   Thanks for the clarification @StephanEwen . I think size-based and 
time-based snapshot are two options each with their advantage. The size-based 
snapshot for log truncation/consolidation could indeed prevent the small file 
problem but will also have possibly longer recovery time (we will replay as 
much as 0.95 x write_buffer_size logs during restore before the size-based 
flush triggered). The time-based snapshot could compact/truncate the log at 
fixed pace but may introduce small file problem, unless we introduce some 
mechanism similar to the proposal here, which I think is still valuable.
   
   And just to confirm, are we aiming at completely replacing the 
snapshot-based checkpoint with log-based checkpoint in the future? Or both will 
be reserved for different user scenario? If the later, I think we still need to 
resolve the small file problems for snapshot-based checkpoint.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to