[
https://issues.apache.org/jira/browse/SLING-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Egli resolved SLING-5285.
--------------------------------
Resolution: Fixed
fixed in http://svn.apache.org/viewvc?rev=1713477&view=rev (which was also
about SLING-5284)
> more aggressive self-check for heartbeat timeout
> ------------------------------------------------
>
> Key: SLING-5285
> URL: https://issues.apache.org/jira/browse/SLING-5285
> Project: Sling
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: Discovery Impl 1.2.0
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Fix For: Discovery Impl 1.2.2
>
>
> SLING-5195 introduced a self-check that was monitoring if the
> HeartbeatHandler was properly storing the heartbeats regularly. This is done
> because there are different reasons why that might not be the case, eg: the
> HeartbeatHandler could be blocked because of another long-running-commit
> happening locally - or it might be blocked due to thread-pool-exhaustion - or
> perhaps something yet different.
> The check was setting off an alarm when the time-since-last-heartbeat was
> bigger than a *heartbeatTimeout*. This however is not sufficient. The
> comparison should be much more aggressive. It should compare against a
> *heartbeatTimeout minus 2 times heartbeatInterval* to have enough safety
> margin. _2 times_ because 1 time is actually the very minimum: this
> background check only _runs_ every heartbeatInterval, so in the worst case it
> could run just _heartbeatInterval_ many seconds before the timeout hits - and
> still be too late by a fraction. So 1 is the very minimum. The _2_ is
> actually adding a safety margin of 1 _heartbeatInterval_ only.
> *Note:* this also means that you should configure the heartbeatTimeout at
> least 4-5 times the heartbeatInterval.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)