rkhachatryan opened a new pull request #19331:
URL: https://github.com/apache/flink/pull/19331


   ## What is the purpose of the change
   
   As described in the ticket, in LEGACY restore mode,
   shared state of incremental checkpoints can be discarded
   regardless of whether they were created by this job or not.
   
   The bug was introduced in FLINK-24611. Before, reference count was maintained
   for each entry;
   "initial" checkpoints did not decrement this count, preventing their shared 
state from being discarded.
   
   This PR makes `SharedStateRegistry` to:
   1. remember the max checkpiont ID encountered during recovery
   2. associate each state entry with a checkpoint ID that created it
   3. only discard the entry if its `createdByCheckpointID` > 
highestRetainCheckpointID``
   
   (1) is called from:
   - `CheckpointCoordinator.restoreSavepoint` - to cover initial restore from a 
checkpoint
   - `SharedStateFactory`, when building checkpoint store - to cover the 
failover case
   
   Only `CheckpointCoordinator` does not seem sufficient, because a new 
checkpoint
   can be created, from which the job can recover automatically, without 
calling `restoreSavepoint`.
   
   (see `DefaultExecutionGraphFactory.createAndRestoreExecutionGraph`)
   
   ## Verifying this change
   
   `ResumeCheckpointManuallyITCase` in `LEGACY` restore mode
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to