Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/2410
  
    Thanks for the contribution @beyond1920. The implementation of the 
`HeartbeatScheduler` goes in the right direction so that it is reusable :-) The 
testing is also better. However, we're still mixing different things in this PR 
(e.g. parts of the slot requesting logic).
    
    I think we can further generalize the heartbeating since the heartbeat 
manager is another component which should be reusable across components (e.g. 
for the JobManager to heartbeat the TMs). Furthermore, the receiving end of the 
heartbeating is not properly defined. 
    
    I think it would be best if we first properly define how this should look 
like. For example, I'm not sure whether the exponential backoff strategy is the 
right way to go since it can happen that you wait twice as long as you've 
defined until you're notified about a heartbeat failure. Another question is 
whether every heartbeat connection should be responsible for triggering itself 
or whether the heartbeat manager should be responsible for that. Then we have 
to define the receiving end. Is the heartbeat receiving end an independent 
`RpcEndpoint`? How does the payload delivery works? Does the sender side asks 
for the result (future) or does the receiving side answers via a tell message 
to the heartbeat manager?
    
    I've created an issue where we should continue the discussion 
https://issues.apache.org/jira/browse/FLINK-4478.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to