tillrohrmann commented on issue #10089: [FLINK-12342][yarn] Remove container 
requests in order to reduce excess containers
URL: https://github.com/apache/flink/pull/10089#issuecomment-550994823
 
 
   > 1. You are right. The the removal of the container requests and the upload 
of files is running in the `YarnResourceManager`'s main thread. I think it is a 
single thread, so if we receive 100 containers in the first heart beat, and it 
will take 5000ms to launch them(50ms for each). Even we receive all the 
remaining 900 containers in the second heart beat, `removeContainerRequest` 
could not be called timely. So we still have excess containers. I agree with 
you that it could be done in the future optimization. (1) Do not upload 
taskmanager-conf.yaml to hdfs, use dynamic properties instead. (2) Use 
`NMClientAsync` instead of `NMClient`.
   
   All right, now understand the problem @wangyang0918. Yes if the main thread 
is blocked (this applies to other blocking RM operations as well), then it 
might happen that we don't process another `onContainersAllocated` in time. I 
think the general solution is to avoid blocking operations being executed in 
the main thread. I would suggest to change 
https://issues.apache.org/jira/browse/FLINK-14582 in this direction and let 
avoiding to write a file being one solution approach. As you've said there are 
other approaches as well, like using a dedicated thread or maybe the 
`NMClientAsync`. In fact the last approach has already been proposed in 
https://issues.apache.org/jira/browse/FLINK-13184.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to