tillrohrmann commented on issue #10089: [FLINK-12342][yarn] Remove container requests in order to reduce excess containers URL: https://github.com/apache/flink/pull/10089#issuecomment-550994823 > 1. You are right. The the removal of the container requests and the upload of files is running in the `YarnResourceManager`'s main thread. I think it is a single thread, so if we receive 100 containers in the first heart beat, and it will take 5000ms to launch them(50ms for each). Even we receive all the remaining 900 containers in the second heart beat, `removeContainerRequest` could not be called timely. So we still have excess containers. I agree with you that it could be done in the future optimization. (1) Do not upload taskmanager-conf.yaml to hdfs, use dynamic properties instead. (2) Use `NMClientAsync` instead of `NMClient`. All right, now understand the problem @wangyang0918. Yes if the main thread is blocked (this applies to other blocking RM operations as well), then it might happen that we don't process another `onContainersAllocated` in time. I think the general solution is to avoid blocking operations being executed in the main thread. I would suggest to change https://issues.apache.org/jira/browse/FLINK-14582 in this direction and let avoiding to write a file being one solution approach. As you've said there are other approaches as well, like using a dedicated thread or maybe the `NMClientAsync`. In fact the last approach has already been proposed in https://issues.apache.org/jira/browse/FLINK-13184.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
