Sounds like a good strategy to me +1
Carsten 2014-02-07 11:38 GMT+01:00 Stefan Egli <[email protected]>: > Hi, > > During an offline discussion, Felix brought up the suggestion to lower the > topology connector's heartbeat frequency. Currently they are sent every 15 > or 30 sec, which might seem a lot - especially as they were way too chatty > (which is fixed now with SLING-3377). > > The main reason for having a high heartbeat frequency is quicker failure > detection - but it's obviously a trade-off as it increases load. > > I would like to get some opinion on to the following proposal: > > * introduce two different sets of heartbeats, one for repository and > one for connectors > * the repository ones would remain at the current frequency (suggested > default: 30sec interval, 60sec timeout). The idea is that we would want to > detect crashes within a cluster rather quickly, more quickly than in the > topology in general. > * the connectors would get a back-off behavior, where initially the > values are the same (30sec/60sec) but then they send out less frequent > heartbeats over time, reaching a max (eg 5min). This would have to be > controlled by the receiving side, ie both sides of the connector have to > agree that interval and timeout are the same. > > I've opened a Jira to track this, please comment there: > > https://issues.apache.org/jira/browse/SLING-3382 > > Thanks, > Cheers, > Stefan > -- Carsten Ziegeler [email protected]
