[ 
https://issues.apache.org/jira/browse/FLINK-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051230#comment-17051230
 ] 

Andrey Zagrebin commented on FLINK-16416:
-----------------------------------------

As I understand, the TM process is designed to constantly run in production. It 
means that there are 2 reasons for it to exit:
 * It is not needed anymore, so it can be just killed. In this case, there 
should be usually no running jobs. Otherwise, why would users stop it?
 * It fails fatally. In this case, all jobs have to restarted either completely 
or from the latest savepoint anyways. There is no guarantee for anything after 
the job start or the latest checkpoint.

In both cases there is no need to take care about the graceful shutdown in 
terms of releasing anything within JVM as the TM is either not needed or dead. 
In case of special resources outside of JVM system process scope, there is the 
shutdown hook.

> Shutdown the task manager gracefully in standalone mode
> -------------------------------------------------------
>
>                 Key: FLINK-16416
>                 URL: https://issues.apache.org/jira/browse/FLINK-16416
>             Project: Flink
>          Issue Type: Improvement
>          Components: Command Line Client
>            Reporter: Yangze Guo
>            Priority: Major
>
> Recently, I try to add a new {{GPUManager}} to the {{TaskExecutorServices}}. 
> I register the "GPUManager#close" function, in which I write some cleanup 
> logic, to the {{TaskExecutorServices#shutDown}}. However, I found that the 
> cleanup logic does not run as expected in standalone mode.
>  After an investigation in the codebase, I found that the 
> {{TaskExecutorServices#shutDown}} will be called only on a fatal error while 
> we just kill the TM process in the {{flink-daemon.sh}}. However, the LOG 
> shows that some services, e.g. TaskExecutorLocalStateStoresManager, did clean 
> up themselves by registering {{shutdownHook}}.
>  If that is the right way, then we need to register a {{shutdownHook}} for 
> {{TaskExecutorServices}} as well.
>  If that is not, we may find another solution to shutdown TM gracefully.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to