[
https://issues.apache.org/jira/browse/SLING-3382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Egli resolved SLING-3382.
--------------------------------
Resolution: Fixed
Fix Version/s: Discovery Impl 1.0.4
implemented the following backoff behavior:
* for connectors that have stable announcements (ie where the announcements
are identical) for a number of heartbeatIntervals, the servlet instructs the
client to use a 'backoffInterval', which is increased to a maximum.
* the maximum backoffInterval is configurable, currently set a 5 times the
heartbeatInterval (the config parameter is relative)
* whenever the client sends a different announcement (ie anything changes from
its point of view in the topology), then the backoffInterval is reset
> introduce back-off strategy for topology connector frequency
> ------------------------------------------------------------
>
> Key: SLING-3382
> URL: https://issues.apache.org/jira/browse/SLING-3382
> Project: Sling
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: Discovery Impl 1.0.2
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Fix For: Discovery Impl 1.0.4
>
>
> Currently topology heartbeats are sent every 15 or 30 sec, which might seem a
> lot – especially as they were way too chatty (which is fixed now with
> SLING-3377). The suggestion by [~fmeschbe] is to lower this heartbeat
> frequency.
> The main reason for having a high heartbeat frequency is quicker failure
> detection – but it's obviously a trade-off as it increases load.
> Here's a proposal for how to tackle this:
> * introduce two different sets of heartbeats, one for repository and one for
> connectors
> * the repository ones would remain at the current frequency (suggested
> default: 30sec interval, 60sec timeout). The idea is that we would want to
> detect crashes within a cluster rather quickly, more quickly than in the
> topology in general.
> * the connectors would get a back-off behavior, where initially the values
> are the same (30sec/60sec) but then they send out less frequent heartbeats
> over time, reaching a max (eg 5min). This would have to be controlled by the
> receiving side, ie both sides of the connector have to agree that interval
> and timeout are the same.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)