kezhuw commented on pull request #15605:
URL: https://github.com/apache/flink/pull/15605#issuecomment-819542959


   To be honest, the initial hit off the top of my head when receiving 
FLINK-21996 is that are we build an unreliable rpc channel ? Then I realized 
that `AkkaOptions.ASK_TIMEOUT`. I wonder whether we could solve this by 
specifying a timeout for `TaskExecutorGateway.sendOperatorEventToTask` much 
larger than `HeartbeatManagerOptions.HEARTBEAT_INTERVAL`. This way, if 
"received" future fails, it means task manager is already considered as down by 
heartbeat manager. Is there anything wrong or are we just paranoid here to 
unknown errors ? It might be caused by my few knowledge of akka. I assumed akka 
messaging is reliable(eg. ordered messaging, delivery failure will timeout 
heartbeat finally). @StephanEwen  @tillrohrmann 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to