[jira] [Commented] (FLINK-29200) Provide the way to delay the pod deletion for debugging purpose

Yang Wang (Jira) Wed, 07 Sep 2022 21:00:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-29200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601618#comment-17601618
 ]


Yang Wang commented on FLINK-29200:
-----------------------------------

I hesitate to do this in the production code. By using the pod template, you 
could mount a hostpath or PV to persist the logs and even core dump file.

Moreover, if the TaskManager process already crashed, I do not think we could 
tunnel in the pod and do some debugging.

> Provide the way to delay the pod deletion for debugging purpose
> ---------------------------------------------------------------
>
>                 Key: FLINK-29200
>                 URL: https://issues.apache.org/jira/browse/FLINK-29200
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>            Reporter: Aitozi
>            Priority: Major
>
> Currently, if the TaskManager heartbeat timeout the pod will be deleted 
> immediately. It's not very convenient for debugging the internal reason, eg: 
> we can not easily get the core dump files if it's crashed for JVM bugs and so 
> on.
> So, I propose to introduce an option to control the delay of the pod 
> deletion, it can be enabled to keep the pod alive for some debugging reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-29200) Provide the way to delay the pod deletion for debugging purpose

Reply via email to