[
https://issues.apache.org/jira/browse/FLINK-21942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann closed FLINK-21942.
---------------------------------
Fix Version/s: 1.12.3
Resolution: Fixed
Fixed via
1.13.0:
8aa510b705bdcfe5b8ff69bc0e294a56b437f53e
6b40ff1f384c5a2253c8393c3612d3384ae6bfc5
2eb5d1ce886824fb9eb61847ab56ffba4223a2bf
1.12.3:
3409e7f7e52d1dcb70ce238177bcd837f9bb15d3
8c475b3f0e40be34325a7b37a5b4dbbca738b55d
c25dc3f83e07adf4f0788d09201b03bfc8e92801
> KubernetesLeaderRetrievalDriver not closed after terminated which lead to
> connection leak
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-21942
> URL: https://issues.apache.org/jira/browse/FLINK-21942
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.12.2, 1.13.0
> Reporter: Yi Tang
> Assignee: Yang Wang
> Priority: Major
> Labels: k8s-ha, pull-request-available
> Fix For: 1.13.0, 1.12.3
>
> Attachments: image-2021-03-24-18-08-30-196.png,
> image-2021-03-24-18-08-42-116.png, jstack.l
>
>
> Looks like KubernetesLeaderRetrievalDriver is not closed even if the
> KubernetesLeaderElectionDriver is closed and job reach globally terminated.
> This will lead to many configmap watching be still active with connections to
> K8s.
> When the connections exceeds max concurrent requests, those new configmap
> watching can not be started. Finally leads to all new jobs submitted timeout.
> [~fly_in_gis] [~trohrmann] This may be related to FLINK-20695, could you
> confirm this issue?
> But when many jobs are running in same session cluster, the config map
> watching is required to be active. Maybe we should merge all config maps
> watching?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)