[ 
https://issues.apache.org/jira/browse/FLINK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354757#comment-17354757
 ] 

Zhu Zhu edited comment on FLINK-16069 at 6/1/21, 2:54 AM:
----------------------------------------------------------

Even if the main thread can have the highest priority, GC problem can still 
happen when serialized {{TaskDeploymentDescriptors}} are generated too fast and 
queued to be sent out. These queued {{TaskDeploymentDescriptors}} will cost 
lots of memory and cannot be GCed before sent out. Frequent young GC would 
consume lots of CPU. And more heap memory will be required or consecutive full 
GC could happen.


was (Author: zhuzh):
Even if the main thread can have the highest priority, GC problem can still 
happen when serialized {{TaskDeploymentDescriptor}}s are generated too fast and 
queued to be sent out. These queued {{TaskDeploymentDescriptor}}s will cost 
lots of memory and cannot be {{GC}}ed before sent out. Frequent young GC would 
consume lots of CPU. And more heap memory will be required or consecutive full 
GC could happen.

> Creation of TaskDeploymentDescriptor can block main thread for long time
> ------------------------------------------------------------------------
>
>                 Key: FLINK-16069
>                 URL: https://issues.apache.org/jira/browse/FLINK-16069
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: huweihua
>            Priority: Major
>         Attachments: FLINK-16069-POC-results
>
>
> The deploy of tasks will take long time when we submit a high parallelism 
> job. And Execution#deploy run in mainThread, so it will block JobMaster 
> process other akka messages, such as Heartbeat. The creation of 
> TaskDeploymentDescriptor take most of time. We can put the creation in future.
> For example, A job [source(8000)->sink(8000)], the total 16000 tasks from 
> SCHEDULED to DEPLOYING took more than 1mins. This caused the heartbeat of 
> TaskManager timeout and job never success.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to