[
https://issues.apache.org/jira/browse/FLINK-29200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601618#comment-17601618
]
Yang Wang commented on FLINK-29200:
-----------------------------------
I hesitate to do this in the production code. By using the pod template, you
could mount a hostpath or PV to persist the logs and even core dump file.
Moreover, if the TaskManager process already crashed, I do not think we could
tunnel in the pod and do some debugging.
> Provide the way to delay the pod deletion for debugging purpose
> ---------------------------------------------------------------
>
> Key: FLINK-29200
> URL: https://issues.apache.org/jira/browse/FLINK-29200
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Reporter: Aitozi
> Priority: Major
>
> Currently, if the TaskManager heartbeat timeout the pod will be deleted
> immediately. It's not very convenient for debugging the internal reason, eg:
> we can not easily get the core dump files if it's crashed for JVM bugs and so
> on.
> So, I propose to introduce an option to control the delay of the pod
> deletion, it can be enabled to keep the pod alive for some debugging reason.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)