Github user sihuazhou commented on the issue:
https://github.com/apache/flink/pull/6087
Hi @zhangminglei , I would suggest to retry, this is more consistent with
the behavior when we are using the sync API, in fact, currently if we don't
retry to start container then the job will fail to acquire slots, and RM will
also stop to start container for the jobs that are waiting for slots.---
