Stefan Egli created SLING-3382:
----------------------------------
Summary: introduce back-off strategy for topology connector
frequency
Key: SLING-3382
URL: https://issues.apache.org/jira/browse/SLING-3382
Project: Sling
Issue Type: Improvement
Components: Extensions
Affects Versions: Discovery Impl 1.0.2
Reporter: Stefan Egli
Assignee: Stefan Egli
Currently topology heartbeats are sent every 15 or 30 sec, which might seem a
lot – especially as they were way too chatty (which is fixed now with
SLING-3377). The suggestion by [~fmeschbe] is to lower this heartbeat frequency.
The main reason for having a high heartbeat frequency is quicker failure
detection – but it's obviously a trade-off as it increases load.
Here's a proposal for how to tackle this:
* introduce two different sets of heartbeats, one for repository and one for
connectors
* the repository ones would remain at the current frequency (suggested
default: 30sec interval, 60sec timeout). The idea is that we would want to
detect crashes within a cluster rather quickly, more quickly than in the
topology in general.
* the connectors would get a back-off behavior, where initially the values are
the same (30sec/60sec) but then they send out less frequent heartbeats over
time, reaching a max (eg 5min). This would have to be controlled by the
receiving side, ie both sides of the connector have to agree that interval and
timeout are the same.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)