zentol commented on pull request #17606: URL: https://github.com/apache/flink/pull/17606#issuecomment-957870305
> You could enable debug logs and check whether we send heartbeat requests after the TM died. We do still send requests, but Akka does not report the TM as being unreachable. > Maybe this is also something we can configure in Akka. I skimmed the Akka config reference and nothing stood out to me that would match the observed ~2s detection duration. But this could even be an OS-level thing with it being TCP and all. > The advantage of increasing the restart attempts instead of the delay is that the test will on average run faster. I don't have a strong opinion on this. A single restart results in cleaner logs. If we're aiming to do at most 1-2 restarts then the difference will be negligible either way (+-2s)🤷 The test already needs 30-50 seconds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
