[jira] [Commented] (FLINK-19289) K8s resource manager terminated pod garbage collection

Xintong Song (Jira) Fri, 18 Sep 2020 02:07:37 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198234#comment-17198234
 ]


Xintong Song commented on FLINK-19289:
--------------------------------------

Hi [~yittg],

Thanks for pulling me in.

I'm not sure whether I have understand the problem you described correctly. By 
"deal with this case properly", do you mean that Flink should remove the error 
pod from K8s?

If that is what you mean, I think Flink should be able to remove such pods. For 
such pods terminated during JM failover, Flink should receive not only a ADDED 
event, but also the MODIFIED/ERROR events, thus triggering removal of the pods.

Have you verified that the error pod is not removed even after the JM 
successfully recovered for a while?

> K8s resource manager terminated pod garbage collection
> ------------------------------------------------------
>
>                 Key: FLINK-19289
>                 URL: https://issues.apache.org/jira/browse/FLINK-19289
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Yi Tang
>            Priority: Minor
>
> For a senario,
> During JM is down (no JM is running), a TM down with error (for reasons from 
> the node or TM inner), then an Error pod present there. After one JM recover, 
> it will receive a ADDED event about this pod and do nothing.
> We should deal with this case in `onAdded` callback properly, I think.
> cc [~xintongsong].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19289) K8s resource manager terminated pod garbage collection

Reply via email to