[jira] [Updated] (FLINK-21942) KubernetesLeaderRetrievalDriver not closed after terminated which lead to connection leak

Yi Tang (Jira) Tue, 23 Mar 2021 20:59:07 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yi Tang updated FLINK-21942:
----------------------------
    Description: 
Looks like KubernetesLeaderRetrievalDriver is not closed even if the 
KubernetesLeaderElectionDriver is closed and job reach globally terminated.
This will lead to many configmap watching be still active with connections to 
K8s.

When the connections exceeds max concurrent requests, those new configmap 
watching can not be started. Finally leads to all new jobs submitted timeout.

[~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you 
confirm this issue?
But when many jobs are running in same session cluster, the config map watching 
is required to be active. Maybe we should merge all config maps watching?

  was:
Looks like KubernetesLeaderRetrievalDriver is not closed even if the 
KubernetesLeaderElectionDriver is closed and job reach globally terminated.
This will lead to many configmap watching be still active with connections to 
K8s.

When the connections exceeds max concurrent requests, those new configmap 
watching can not be started. Finally leads to all new jobs submitted timeout.

[~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you 
confirm this issue?
But when many jobs are running in same session cluster, the config map is 
required to be active. Maybe we should merge all config maps watching?


> KubernetesLeaderRetrievalDriver not closed after terminated which lead to 
> connection leak
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-21942
>                 URL: https://issues.apache.org/jira/browse/FLINK-21942
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Yi Tang
>            Priority: Major
>
> Looks like KubernetesLeaderRetrievalDriver is not closed even if the 
> KubernetesLeaderElectionDriver is closed and job reach globally terminated.
> This will lead to many configmap watching be still active with connections to 
> K8s.
> When the connections exceeds max concurrent requests, those new configmap 
> watching can not be started. Finally leads to all new jobs submitted timeout.
> [~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you 
> confirm this issue?
> But when many jobs are running in same session cluster, the config map 
> watching is required to be active. Maybe we should merge all config maps 
> watching?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-21942) KubernetesLeaderRetrievalDriver not closed after terminated which lead to connection leak

Reply via email to