[jira] [Comment Edited] (FLINK-22006) Could not run more than 20 jobs in a native K8s session when K8s HA enabled

Yi Tang (Jira) Mon, 29 Mar 2021 03:31:21 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310546#comment-17310546
 ]


Yi Tang edited comment on FLINK-22006 at 3/29/21, 10:30 AM:
------------------------------------------------------------

[~fly_in_gis] Did you reproduce the issue?  I think it's easy to reproduce.

And i believe the problem is about the okhttp connectionPool. Fabric8 
kubernetes client will create a separate webSocket connection for each config 
map watching. So after it exhausting all connections in the connection pool, 
new webSocket request can not be established. Then the new Job can not be 
submitted successfully.

So my advice is creating a single config map watching and dispatching events 
for different usages if possible and if we still using the fabric8 kubernetes 
client.

But before this, i would give more effort to provide more detail to prove it.


was (Author: yittg):
[~fly_in_gis] Did you reproduce the issue?  I think it's easy to reproduce.

And the problem is definitely about the okhttp connectionPool. Fabric8 
kubernetes client will create a separate webSocket connection for each config 
map watching. So after it exhausting all connections in the connection pool, 
new webSocket request can not be established. Then the new Job can not be 
submitted successfully.

So my advice is creating a single config map watching and dispatching events 
for different usages if possible and if we still using the fabric8 kubernetes 
client.

> Could not run more than 20 jobs in a native K8s session when K8s HA enabled
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-22006
>                 URL: https://issues.apache.org/jira/browse/FLINK-22006
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.12.2, 1.13.0
>            Reporter: Yang Wang
>            Priority: Critical
>              Labels: k8s-ha
>         Attachments: image-2021-03-24-18-08-42-116.png
>
>
> Currently, if we start a native K8s session cluster when K8s HA enabled, we 
> could not run more than 20 streaming jobs. 
>  
> The latest job is always initializing, and the previous one is created and 
> waiting to be assigned. It seems that some internal resources have been 
> exhausted, e.g. okhttp thread pool , tcp connections or something else.
> !image-2021-03-24-18-08-42-116.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-22006) Could not run more than 20 jobs in a native K8s session when K8s HA enabled

Reply via email to