[jira] [Resolved] (SLING-5285) more aggressive self-check for heartbeat timeout

Stefan Egli (JIRA) Mon, 09 Nov 2015 09:15:36 -0800

     [ 
https://issues.apache.org/jira/browse/SLING-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stefan Egli resolved SLING-5285.
--------------------------------
    Resolution: Fixed

fixed in http://svn.apache.org/viewvc?rev=1713477&view=rev (which was also 
about SLING-5284)

> more aggressive self-check for heartbeat timeout
> ------------------------------------------------
>
>                 Key: SLING-5285
>                 URL: https://issues.apache.org/jira/browse/SLING-5285
>             Project: Sling
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: Discovery Impl 1.2.0
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>             Fix For: Discovery Impl 1.2.2
>
>
> SLING-5195 introduced a self-check that was monitoring if the 
> HeartbeatHandler was properly storing the heartbeats regularly. This is done 
> because there are different reasons why that might not be the case, eg: the 
> HeartbeatHandler could be blocked because of another long-running-commit 
> happening locally - or it might be blocked due to thread-pool-exhaustion - or 
> perhaps something yet different.
> The check was setting off an alarm when the time-since-last-heartbeat was 
> bigger than a *heartbeatTimeout*. This however is not sufficient. The 
> comparison should be much more aggressive. It should compare against a 
> *heartbeatTimeout minus 2 times heartbeatInterval* to have enough safety 
> margin. _2 times_ because 1 time is actually the very minimum: this 
> background check only _runs_ every heartbeatInterval, so in the worst case it 
> could run just _heartbeatInterval_ many seconds before the timeout hits - and 
> still be too late by a fraction. So 1 is the very minimum. The _2_ is 
> actually adding a safety margin of 1 _heartbeatInterval_ only.
> *Note:* this also means that you should configure the heartbeatTimeout at 
> least 4-5 times the heartbeatInterval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (SLING-5285) more aggressive self-check for heartbeat timeout

Reply via email to