[
https://issues.apache.org/jira/browse/FLINK-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029433#comment-17029433
]
Canbin Zheng commented on FLINK-15843:
--------------------------------------
{color:#0e101a}Well it depends, if all temporary files are created in EmptyDir,
K8s would clean up them, however, one could mount some of the tempory
directories such as \{{state.backend.rocksdb.localdir}} or \{{io.tmp.dirs}} to
HostPath or PersistenVolume thus there's no chance for clean-up. It could be
potentially problematic after {color}{color:#4a6ee0}Support mounting
volumes{color}{color:#0e101a} is finished.{color}
> Do not violently kill TaskManagers on Kubernetes
> ------------------------------------------------
>
> Key: FLINK-15843
> URL: https://issues.apache.org/jira/browse/FLINK-15843
> Project: Flink
> Issue Type: Sub-task
> Components: Deployment / Kubernetes
> Affects Versions: 1.10.0
> Reporter: Canbin Zheng
> Priority: Major
> Fix For: 1.11.0
>
>
> The current solution of stopping a TaskManager instance when JobManager sends
> a deletion request is by directly calling
> {{KubernetesClient.pods().withName().delete}}, thus that instance would be
> violently killed with a _KILL_ signal and having no chance to clean up, which
> could cause problems because we expect the process to gracefully terminate
> when it is no longer needed.
> Refer to the guide of [Termination of Pods|#termination-of-pods], we know
> that on Kubernetes a _TERM_ signal would be first sent to the main process in
> each container, and may be followed up with a force _KILL_ signal if the
> graceful shut-down period has expired; the Unix signal will be sent to the
> process which has PID 1 ([Docker
> Kill|https://docs.docker.com/engine/reference/commandline/kill/]), however,
> the TaskManagerRunner process is spawned by
> {color:#172b4d}/opt/flink/bin/kubernetes-entry.sh {color}and could never have
> PID 1, so it would never receive the Unix signal.
>
> One walk around could be that JobManager firstly sends a *KILL_WORKER*
> message to the TaskManager, then the TaskManager gracefully terminates itself
> to ensure that the clean-up is completely finished, lastly, the JobManager
> deletes the Pod after a configurable graceful shut-down period.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)