pltbkd commented on pull request #16357: URL: https://github.com/apache/flink/pull/16357#issuecomment-874688034
> Well one needs to ask themselves why it is that the timeout is multiple times the interval. > > If the timeout is that large because a target should truly only be considered unreachable if nothing got through during this entire period, then in any case both mechanism will work the same way (<= because users configure it that way). In fact I saw that in `HeartbeatManagerOptions`, defined as defaultValue though the method is deprecated, am I at the right place? > IOW, any RPC message would be treated like a heartbeat request, and heartbeats are just a way to ensure periodic communication. There's a counter example that, assuming the heartbeat interval is 10s and timeout is 5s, when JM send a RPC request(not heartbeat) that costs more 5s to process in the remote TM, and there's no response of other RPC requests or heartbeat requests received during the 5s, the request will result in heartbeat timeout, though there's nothing wrong. The main difference is that a heartbeat request is expected to be responded as soon as possible, while a RPC request may take some time before responded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
