[jira] [Commented] (SLING-3321) Incorrect caching/timeout behavior with slow health check

Bertrand Delacretaz (JIRA) Thu, 23 Jan 2014 04:47:58 -0800

    [ 
https://issues.apache.org/jira/browse/SLING-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879898#comment-13879898
 ]


Bertrand Delacretaz commented on SLING-3321:
--------------------------------------------

Looks much better after [~cziegeler]'s http://svn.apache.org/r1560567 changes, 
thanks!

I still get some timeouts with the while loop of the SLING-3321-log.txt 
attachment and resultCacheTtlInMs set to 5000, but timeout results are not 
cached anymore.

To avoid getting timeouts we might implement a policy to start executing a new 
check if the cached result is older than say 50% of its time to live in cache, 
but that's probably better considered together with a general mechanism to 
refresh HC results regularly.

> Incorrect caching/timeout behavior with slow health check
> ---------------------------------------------------------
>
>                 Key: SLING-3321
>                 URL: https://issues.apache.org/jira/browse/SLING-3321
>             Project: Sling
>          Issue Type: Bug
>          Components: Health Check
>    Affects Versions: Health Check Core 1.0.8
>            Reporter: Bertrand Delacretaz
>            Assignee: Bertrand Delacretaz
>         Attachments: SLING-3321-log.txt
>
>
> We might not need to fix this right now, just making a note of some tests I 
> did with the SlowHealthCheckSample.
> By default SlowHealthCheckSample takes 1200-3700 msec to execute, and I have 
> set the cache lifetime to 5 seconds.
> With these settings, executing the health check every second should always 
> provide a result: even if a particular execute call takes more than the 
> default 2 seconds execution timeout, an older cached result should still be 
> available as 3700 (max execution time) + 1000 (execution period) is smaller 
> than 5000 (time to live in cache)
> I'll attach an execution log which shows that this is not the case. I see two 
> problems:
> # A result which times out is cached and reused, even though the actual 
> execution might have finished in the meantime. We then get a timeout result 
> and the actual result is thrown away. There's no " execution counter=2" 
> result in my log for example.
> # There's no way to say "execute the health check, but if it times out use an 
> older result if still valid". We might need an execution option for that, as 
> you don't always want that.
> I think this is a realistic use case, checking external systems for example 
> might have that kind of timing characteristics. I should be able to call the 
> executor for such an HC every second, for example, and get a result every 
> time,.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (SLING-3321) Incorrect caching/timeout behavior with slow health check

Reply via email to