tillrohrmann opened a new pull request #15105: URL: https://github.com/apache/flink/pull/15105
This PR introduces the notion of rejected registration attempts to the `RetryingRegistration` and `RegisteredRpcConnection`. This functionality is used to let the `JobManager` reject connection attempts from `TaskManagers` who think that the `JobManager` is responsible for another job. On the `TaskManager` side, a rejected connection attempt will clear all relevant job resources (slots, partitions) because the `TaskManager` no longer knows where the `JobManager` for the sought-after job is running. The same functionality is used to reject `TaskManager` connections to the `ResourceManager` if the `TaskManager` is not known by the `ResourceManager`. If the `TaskManager` sees the connection rejection, then it will shut itself down. This gives a better behaviour compared to waiting for the max registration timeout to be fired. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
