[jira] [Commented] (FLINK-8777) improve resource release when recovery from failover

ASF GitHub Bot (JIRA) Wed, 28 Feb 2018 01:12:52 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380001#comment-16380001
 ]


ASF GitHub Bot commented on FLINK-8777:
---------------------------------------

Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5578#discussion_r171179946
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/state/TaskLocalStateStoreImpl.java
 ---
    @@ -90,6 +92,9 @@
        @GuardedBy("lock")
        private boolean disposed;
     
    +   /** Whether to discard the useless state when retrieve local checkpoint 
state. */
    +   private boolean retrieveWithDiscard = true;
    --- End diff --
    
    Then there are two better options in my opinion, because the flag is pure 
boilerplate:
    
    - Change the test to check what we are doing now, because that is what 
happens in the real use-case.
    - Maybe even better: split the method `retrieveLocalState` further: one 
method for pruning, one package-private method that does all the pure 
retrieval, logging, and `null` transformation. In the old `retrieveLocalState`, 
do the cleanup first, then the pure retrieval/logging. Call the package private 
method in the test.
    
    Maybe the test should then also just do both?


> improve resource release when recovery from failover
> ----------------------------------------------------
>
>                 Key: FLINK-8777
>                 URL: https://issues.apache.org/jira/browse/FLINK-8777
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Sihua Zhou
>            Assignee: Sihua Zhou
>            Priority: Major
>             Fix For: 1.5.0
>
>
> When recovery from failed, {{TaskLocalStateStoreImpl.retrieveLocalState()}} 
> will be invoked, we can release all entry from 
> {{storedTaskStateByCheckpointID}}  that does not satisfy {{entry.checkpointID 
> == checkpointID}}, this can prevent the resource leak when job loop in 
> {{local checkpoint completed => failed => local checkpoint completed => 
> failed ...}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8777) improve resource release when recovery from failover

Reply via email to