[ 
https://issues.apache.org/jira/browse/FLINK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308456#comment-17308456
 ] 

Yang Wang commented on FLINK-21942:
-----------------------------------

[~yittg] Thanks for providing the information.

After more investigation, I have to admit that we have the leader ConfigMap 
watch leak currently. When the job reaches to terminal state, the jobmanager 
leader retrieval service in ResourceManager is not stopped correctly. We start 
the "job leader id monitoring" in {{ResourceManager#registerJobManager}}, but 
we do not stop it when we {{disconnectJobManager}}. cc [~trohrmann]

 

For the second problem(could not start more than 60 batch jobs or 20 streaming 
jobs in a session), I am trying to reproduce it.

> KubernetesLeaderRetrievalDriver not closed after terminated which lead to 
> connection leak
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-21942
>                 URL: https://issues.apache.org/jira/browse/FLINK-21942
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Yi Tang
>            Priority: Major
>         Attachments: image-2021-03-24-18-08-30-196.png, 
> image-2021-03-24-18-08-42-116.png, jstack.l
>
>
> Looks like KubernetesLeaderRetrievalDriver is not closed even if the 
> KubernetesLeaderElectionDriver is closed and job reach globally terminated.
> This will lead to many configmap watching be still active with connections to 
> K8s.
> When the connections exceeds max concurrent requests, those new configmap 
> watching can not be started. Finally leads to all new jobs submitted timeout.
> [~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you 
> confirm this issue?
> But when many jobs are running in same session cluster, the config map 
> watching is required to be active. Maybe we should merge all config maps 
> watching?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to