[ 
https://issues.apache.org/jira/browse/FLINK-34566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823396#comment-17823396
 ] 

Fei Feng commented on FLINK-34566:
----------------------------------

Thanks [~gyfora]  for comment this issue. 
Firstly, I am not saying JOSDK has bug, I think JOSDK just change the creation 
logic about reconciliation thread pool.
And, Yes, "The thread pool max size is set correctly", but, actually, we can 
never have at most parallelism as we defined. 
As Java doc saying:
"If there are more than corePoolSize but less than maximumPoolSize threads 
running, {*}a new thread will be created only if the queue is full.{*}"

this means the we can only have a parallelism equals corePoolSize, other event 
was put in workqueue and wait for handle. And Let's see about queueing's 
mechanism for detail:

"Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue 
without a predefined capacity) will cause new tasks to wait in the queue when 
all corePoolSize threads are busy. {*}Thus, no more than corePoolSize threads 
will ever be created. (And the value of the maximumPoolSize therefore doesn't 
have any effect.){*}"

The Java doc states very clearly that when we using an unbounded queue ( JOSDK 
use LinkedBlockingQueue exactly :( ), the setting of maximumPoolSize is 
essentially ineffective, so we can only hava a parallelism equals corePoolSize. 
So In flink kubernetes operator , the  reconciliation parallism was only 10   
(too small for a large scale k8s cluster),  even we give a large 
maximumPoolSize. 

> Flink Kubernetes Operator reconciliation parallelism setting not work
> ---------------------------------------------------------------------
>
>                 Key: FLINK-34566
>                 URL: https://issues.apache.org/jira/browse/FLINK-34566
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.7.0
>            Reporter: Fei Feng
>            Priority: Blocker
>         Attachments: image-2024-03-04-10-58-37-679.png, 
> image-2024-03-04-11-17-22-877.png, image-2024-03-04-11-31-44-451.png
>
>
> After we upgrade JOSDK to version 4.4.2 from version 4.3.0 in FLINK-33005 , 
> we can not enlarge reconciliation parallelism , and the maximum 
> reconciliation parallelism was only 10. This results FlinkDeployment and 
> SessionJob 's reconciliation delay about 10-30 seconds when we have a large 
> scale flink session cluster and session jobs in k8s cluster。
>  
> After investigating and validating, I found the reason is the logic for 
> reconciliation thread pool creation in JOSDK has changed significantly 
> between this two version. 
> v4.3.0: 
> reconciliation thread pool was created as a FixedThreadPool ( maximumPoolSize 
> was same as corePoolSize), so we pass the reconciliation thread and get a 
> thread pool that matches our expectations.
> !image-2024-03-04-10-58-37-679.png|width=497,height=91!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.3.0/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationServiceOverrider.java#L198]
>  
> but in v4.2.0:
> the reconciliation thread pool was created as a customer executor which we 
> can pass corePoolSize and maximumPoolSize to create this thread pool.The 
> problem is that we only set the maximumPoolSize of the thread pool, while, 
> the corePoolSize of the thread pool is defaulted to 10. This causes thread 
> pool size was only 10 and majority of events would be placed in the workQueue 
> for a while.  
> !image-2024-03-04-11-17-22-877.png|width=569,height=112!
> [https://github.com/operator-framework/java-operator-sdk/blob/v4.4.2/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ExecutorServiceManager.java#L37]
>  
> the solution is also simple, we can create and pass thread pool in flink 
> kubernetes operator so that we can control the reconciliation thread pool 
> directly, such as:
> !image-2024-03-04-11-31-44-451.png|width=483,height=98!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to