[jira] [Commented] (FLINK-16069) Creation of TaskDeploymentDescriptor can block main thread for long time

Till Rohrmann (Jira) Wed, 31 Mar 2021 02:22:06 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312218#comment-17312218
 ]


Till Rohrmann commented on FLINK-16069:
---------------------------------------

Could it be that we use somehow the {{ActorSystems}} dispatcher for some of 
these tasks? If this is the case, then it could explain why the dispatcher 
threadpool is starved. If not, then maybe we can increase the priority of the 
dispatcher threadpool's thread to give them more CPU cycles.

> Creation of TaskDeploymentDescriptor can block main thread for long time
> ------------------------------------------------------------------------
>
>                 Key: FLINK-16069
>                 URL: https://issues.apache.org/jira/browse/FLINK-16069
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: huweihua
>            Priority: Major
>         Attachments: FLINK-16069-POC-results
>
>
> The deploy of tasks will take long time when we submit a high parallelism 
> job. And Execution#deploy run in mainThread, so it will block JobMaster 
> process other akka messages, such as Heartbeat. The creation of 
> TaskDeploymentDescriptor take most of time. We can put the creation in future.
> For example, A job [source(8000)->sink(8000)], the total 16000 tasks from 
> SCHEDULED to DEPLOYING took more than 1mins. This caused the heartbeat of 
> TaskManager timeout and job never success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-16069) Creation of TaskDeploymentDescriptor can block main thread for long time

Reply via email to