Github user shixiaogang commented on the issue:

    https://github.com/apache/flink/pull/3801
  
    Hi @gyfora I am very happy to hear from you. The following are the answers 
to your questions. Kindly let me know if you have any idea of them.
    
    1. The incremental checkpoints supports rescaling. It's true that the 
implementation checkpoints files directly for multiple key groups together. But 
in the cases where the degree of parallelism changes, the files will be passed 
to all the state backends whose key groups are in the files. Then the backends 
will iterate over all the key-value pairs in the files and pick up those kv 
pairs that belong to them.
    
    2.  In the cases we restore from a full snapshot (which is formatted as 
key-value pairs), the next incremental checkpoint will contain all the files. 
It may seem a little bit inefficient because i intend to make each checkpoint 
self-contained. Given that full snapshots and incremental snapshots are in 
different formats, we have to take a "full" incremental snapshot as the base 
for following checkpoints.
    
    3. That is a very good question. It will be flexible that users can choose 
the scheme of checkpoints (say one full checkpoint after n incremental 
checkpoints).  But i think making every checkpoint incremental is acceptable 
because incremental checkpoints are more  efficient in most cases. Those 
backends which do not support incremental checkpointing can still take full 
snapshotting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to