zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957870305


   > You could enable debug logs and check whether we send heartbeat requests 
after the TM died.
   
   We do still send requests, but Akka does not report the TM as being 
unreachable.
   
   > Maybe this is also something we can configure in Akka.
   
   I skimmed the Akka config reference and nothing stood out to me that would 
match the observed  ~2s detection duration. But this could even be an OS-level 
thing with it being TCP and all.
   
   > The advantage of increasing the restart attempts instead of the delay is 
that the test will on average run faster.
   
   I don't have a strong opinion on this. A single restart results in cleaner 
logs.
   If we're aiming to do at most 1-2 restarts then the difference will be 
negligible either way (+-2s)🤷 The test already needs 30-50 seconds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to