[ 
https://issues.apache.org/jira/browse/FLINK-5962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926398#comment-15926398
 ] 

ASF GitHub Bot commented on FLINK-5962:
---------------------------------------

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/3548

    [FLINK-5962] [checkpoints] Remove scheduled cancel-task from timer queue to 
prevent memory leaks

    ## Bug
    
    Timer tasks that cancel checkpoints are not eagerly removed from the Timer 
when checkpoints abort/complete. This can lead to memory leaks in the presence 
of very frequent checkpoints.
    
    ## Changes
    
      - This converts the `Timer` to a `ScheduledThreadPoolExecutor` which has 
the ability to remove canceled timers from the priority queue
      - The `PendingCheckpoint` now cancels (i.e. removes) the timer when it is 
disposed (which also happens upon successful completion).
    
    ## Tests
    
      - Adds checks into the `CheckpointCoordinatorTest`
      - Adds a test that the timer is canceled in the `PendingCheckpoint`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink timer_leak

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3548.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3548
    
----

----


> Cancel checkpoint canceller tasks in CheckpointCoordinator
> ----------------------------------------------------------
>
>                 Key: FLINK-5962
>                 URL: https://issues.apache.org/jira/browse/FLINK-5962
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.2.0, 1.3.0
>            Reporter: Till Rohrmann
>            Assignee: Stephan Ewen
>            Priority: Critical
>
> The {{CheckpointCoordinator}} register a canceller task for each running 
> checkpoint. The canceller task's responsibility is to cancel a checkpoint if 
> it takes too long to complete. We should cancel this task as soon as the 
> checkpoint has been completed, because otherwise we will keep many canceller 
> tasks around. This can eventually lead to an OOM exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to