zentol edited a comment on pull request #16342: URL: https://github.com/apache/flink/pull/16342#issuecomment-872920907
> akka.ask.timeout can no longer be used to delay the restart attempts I'm not too concerned about this; we've had proper APIs to control restart delays for a long time. > Do we want to backport this improvement to the release-1.12. and release-1.13 branch? Have we fully grasped the implications of this change yet? In the case of a truly unreachable TM, these changes (along with FLINK-23209) are a clear improvement, but what about cases of intermittent networking issues? Can the tiniest networking issue cause a TM to be considered unreachable and cause all tasks to be re-deployed? The heartbeat interval+timeout safeguarded against such issues because in practice multiple heartbeats had to fail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
