tillrohrmann commented on issue #10089: [FLINK-12342][yarn] Remove container requests in order to reduce excess containers URL: https://github.com/apache/flink/pull/10089#issuecomment-550253030 > Moving the `removeContainerRequest` first in `onContainersAllocated` is a reasonable fix. And it could reduce the excess containers to some extent. However, it could not solve the problem completely. Uploading a config file to hdfs will take 50ms in many production hdfs cluster. It means only 10 containers could be started. If we receive more than 10 containers, the `CallbackHandlerThread` of yarn am-rm client will be blocked. So `removeContainerRequest` may not be called timely. I think we don't execute the removal of the container requests and the upload of files in the `CallbackHandlerThread` but in the `YarnResourceManager's` main thread. Hence, we should call `removeContainerRequest` fast enough once we start processing the callback within the main thread. >* All containers are returned in one heart beat. There is no excess containers. >* Containers are returned in two or more heart beats. More than 3000 containers are allocated from yarn. It is based whether the hdfs is busy. Could you explain the second point a bit more. How did you provoke that the containers are returned in two or more heartbeats? Was the `ResourceManager` busy with other things?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
