[
https://issues.apache.org/jira/browse/FLINK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393869#comment-17393869
]
Zhilong Hong commented on FLINK-16069:
--------------------------------------
After the optimization we've done in FLINK-23005, the performance of the task
deployment has improved. In our experiment, for a large-scale streaming word
count job, the speed of task deployment is 6 times faster than before.
The result is illustrated below:
||Parallelism||Before||After ||
|8000*8000|32.611s|6.480s|
|16000*16000|128.408s|19.051s|
The improvement is also shown in the benchmark implemented in FLINK-20612:
!streaming.png|width=800!
!batch.png|width=800!
In our opinion, this ticket can be closed for now.
> Creation of TaskDeploymentDescriptor can block main thread for long time
> ------------------------------------------------------------------------
>
> Key: FLINK-16069
> URL: https://issues.apache.org/jira/browse/FLINK-16069
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: huweihua
> Priority: Major
> Attachments: FLINK-16069-POC-results, batch.png, streaming.png
>
>
> The deploy of tasks will take long time when we submit a high parallelism
> job. And Execution#deploy run in mainThread, so it will block JobMaster
> process other akka messages, such as Heartbeat. The creation of
> TaskDeploymentDescriptor take most of time. We can put the creation in future.
> For example, A job [source(8000)->sink(8000)], the total 16000 tasks from
> SCHEDULED to DEPLOYING took more than 1mins. This caused the heartbeat of
> TaskManager timeout and job never success.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)