[
https://issues.apache.org/jira/browse/FLINK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14714501#comment-14714501
]
ASF GitHub Bot commented on FLINK-2356:
---------------------------------------
GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/1063
[FLINK-2356] Add shutdown hook to CheckpointCoordinator
This adds a shutdown hook to shutdown the checkpoint coordinator when the
JobManager gets a SIGINT.
The implementation is similar to the implementation we have for other
services, which do clean up via shutdown hooks.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink checkpoint-coord-2356-master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1063.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1063
----
commit 11acb5a9fd0fc48e0445711a6f2aa18f2aa68d36
Author: Ufuk Celebi <[email protected]>
Date: 2015-08-26T16:03:28Z
[FLINK-2356] Add shutdown hook to CheckpointCoordinator to prevent resource
leaks
----
> Resource leak in checkpoint coordinator
> ---------------------------------------
>
> Key: FLINK-2356
> URL: https://issues.apache.org/jira/browse/FLINK-2356
> Project: Flink
> Issue Type: Bug
> Components: JobManager, Streaming
> Affects Versions: 0.9, master
> Reporter: Ufuk Celebi
> Fix For: 0.10, 0.9.1
>
>
> The shutdown method of the checkpoint coordinator is not called when a Flink
> cluster is shutdown via SIGINT. The issue is that the checkpoint coordinator
> shutdown/cleanup is only called after the job enters a final state. This does
> not happen for regular cluster shutdown (via kill). Because we don't have
> proper stopping of streaming jobs, this means that every program using
> checkpointing is suffering from this.
> I've tested this only locally for now with a custom WordCount checkpointing
> the current count. When stopping the process, the files still exist. Since
> this is the same mechanism as in a distributed setup with HDFS, this should
> mean that files in HDFS will be lingering around.
> The problem is that the postStop method of the JM actor is not called when
> shutting down. The task manager components, which need to do resource cleanup
> register custom shutdown hooks and don't rely on a shutdown call from the
> task manager.
> For 0.9.1 we need to make sure that the state is simply cleaned up with a
> shutdown hook (as in the blob manager). For 0.10 with HA we need to be more
> careful and not clean it up when other job manager instances need access. See
> FLINK-2354 for details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)