Stefan Egli created SLING-3382:
----------------------------------

             Summary: introduce back-off strategy for topology connector 
frequency
                 Key: SLING-3382
                 URL: https://issues.apache.org/jira/browse/SLING-3382
             Project: Sling
          Issue Type: Improvement
          Components: Extensions
    Affects Versions: Discovery Impl 1.0.2
            Reporter: Stefan Egli
            Assignee: Stefan Egli


Currently topology heartbeats are sent every 15 or 30 sec, which might seem a 
lot – especially as they were way too chatty (which is fixed now with 
SLING-3377). The suggestion by [~fmeschbe] is to lower this heartbeat frequency.

The main reason for having a high heartbeat frequency is quicker failure 
detection – but it's obviously a trade-off as it increases load.

Here's a proposal for how to tackle this:

 * introduce two different sets of heartbeats, one for repository and one for 
connectors
 * the repository ones would remain at the current frequency (suggested 
default: 30sec interval, 60sec timeout). The idea is that we would want to 
detect crashes within a cluster rather quickly, more quickly than in the 
topology in general.
 * the connectors would get a back-off behavior, where initially the values are 
the same (30sec/60sec) but then they send out less frequent heartbeats over 
time, reaching a max (eg 5min). This would have to be controlled by the 
receiving side, ie both sides of the connector have to agree that interval and 
timeout are the same.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to