tillrohrmann commented on issue #10089: [FLINK-12342][yarn] Remove container 
requests in order to reduce excess containers
URL: https://github.com/apache/flink/pull/10089#issuecomment-550253030
 
 
   > Moving the `removeContainerRequest` first in `onContainersAllocated` is a 
reasonable fix. And it could reduce the excess containers to some extent. 
However, it could not solve the problem completely. Uploading a config file to 
hdfs will take 50ms in many production hdfs cluster. It means only 10 
containers could be started. If we receive more than 10 containers, the 
`CallbackHandlerThread` of yarn am-rm client will be blocked. So 
`removeContainerRequest` may not be called timely.
   
   I think we don't execute the removal of the container requests and the 
upload of files in the `CallbackHandlerThread` but in the 
`YarnResourceManager's` main thread. Hence, we should call 
`removeContainerRequest` fast enough once we start processing the callback 
within the main thread.
   
   >* All containers are returned in one heart beat. There is no excess 
containers.
   >* Containers are returned in two or more heart beats. More than 3000 
containers are allocated from yarn. It is based whether the hdfs is busy.
   
   Could you explain the second point a bit more. How did you provoke that the 
containers are returned in two or more heartbeats? Was the `ResourceManager` 
busy with other things?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to