zentol edited a comment on pull request #16342:
URL: https://github.com/apache/flink/pull/16342#issuecomment-872920907


   > akka.ask.timeout can no longer be used to delay the restart attempts
   
   I'm not too concerned about this; we've had proper APIs to control restart 
delays for a long time.
   
   > Do we want to backport this improvement to the release-1.12. and 
release-1.13 branch? 
   
   Have we fully grasped the implications of this change yet?
   In the case of a truly unreachable TM, these changes (along with 
FLINK-23209) are a clear improvement, but what about cases of intermittent 
networking issues?
   Can the tiniest networking issue cause a TM to be considered unreachable and 
cause all tasks to be re-deployed? The heartbeat interval+timeout safeguarded 
against such issues because in practice multiple heartbeats had to fail.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to