[jira] [Commented] (FLINK-8777) improve resource release when recovery from failover

ASF GitHub Bot (JIRA) Tue, 27 Feb 2018 19:11:11 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379717#comment-16379717
 ]


ASF GitHub Bot commented on FLINK-8777:
---------------------------------------

Github user sihuazhou commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5578#discussion_r171133066
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java
 ---
    @@ -300,6 +291,32 @@ private void deleteDirectory(File directory) throws 
IOException {
                }
        }
     
    +   /**
    +    * Pruning the useless checkpoints.
    +    */
    +   private void pruneCheckpoints(long checkpointID, boolean 
breakTheIteration) {
    +
    +           Iterator<Map.Entry<Long, TaskStateSnapshot>> entryIterator =
    +                   storedTaskStateByCheckpointID.entrySet().iterator();
    +
    +           final List<Map.Entry<Long, TaskStateSnapshot>> toRemove = new 
ArrayList<>();
    +
    +           while (entryIterator.hasNext()) {
    +
    +                   Map.Entry<Long, TaskStateSnapshot> snapshotEntry = 
entryIterator.next();
    +                   long entryCheckpointId = snapshotEntry.getKey();
    +
    +                   if (entryCheckpointId != checkpointID) {
    --- End diff --
    
    I agree with you that the breaking case looks a bit dangerous ... I think 
maybe we could pass a `Predicate` for the `if` and let the caller side pass the 
`Predicate` into this function. This could make it cleaner from the caller side 
and don't need to mass the logic into the `if` to make it complex.


> improve resource release when recovery from failover
> ----------------------------------------------------
>
>                 Key: FLINK-8777
>                 URL: https://issues.apache.org/jira/browse/FLINK-8777
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Sihua Zhou
>            Assignee: Sihua Zhou
>            Priority: Major
>             Fix For: 1.5.0
>
>
> When recovery from failed, {{TaskLocalStateStoreImpl.retrieveLocalState()}} 
> will be invoked, we can release all entry from 
> {{storedTaskStateByCheckpointID}}  that does not satisfy {{entry.checkpointID 
> == checkpointID}}, this can prevent the resource leak when job loop in 
> {{local checkpoint completed => failed => local checkpoint completed => 
> failed ...}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8777) improve resource release when recovery from failover

Reply via email to