zhuzhurk commented on issue #9058: [FLINK-13166] Add support for batch slot requests to SlotPoolImpl URL: https://github.com/apache/flink/pull/9058#issuecomment-510007140 Hi Xingtong, I think you are right that this improvement cannot handle the case you describes. However, the fine-grained recovery can work as a fallback. It uses re-scheduling as a retry for resources. In this way B will finally get assigned with the resources that is released from A and returned to RM. As Stephan mentioned, the failover way can be annoying to Flink users. And Till's PR is targeting for improvement this by reducing task failovers caused by slot allocation timeout. It works for most cases, although not all(like the one you mentioned).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
