Github user tillrohrmann commented on the issue:
https://github.com/apache/flink/pull/2410
Thanks for the contribution @beyond1920. The implementation of the
`HeartbeatScheduler` goes in the right direction so that it is reusable :-) The
testing is also better. However, we're still mixing different things in this PR
(e.g. parts of the slot requesting logic).
I think we can further generalize the heartbeating since the heartbeat
manager is another component which should be reusable across components (e.g.
for the JobManager to heartbeat the TMs). Furthermore, the receiving end of the
heartbeating is not properly defined.
I think it would be best if we first properly define how this should look
like. For example, I'm not sure whether the exponential backoff strategy is the
right way to go since it can happen that you wait twice as long as you've
defined until you're notified about a heartbeat failure. Another question is
whether every heartbeat connection should be responsible for triggering itself
or whether the heartbeat manager should be responsible for that. Then we have
to define the receiving end. Is the heartbeat receiving end an independent
`RpcEndpoint`? How does the payload delivery works? Does the sender side asks
for the result (future) or does the receiving side answers via a tell message
to the heartbeat manager?
I've created an issue where we should continue the discussion
https://issues.apache.org/jira/browse/FLINK-4478.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---