Timothee Maret created SLING-8531:
-------------------------------------

             Summary: Support JournalAvailabilityChecker exponential backoff 
                 Key: SLING-8531
                 URL: https://issues.apache.org/jira/browse/SLING-8531
             Project: Sling
          Issue Type: Improvement
          Components: Content Distribution
    Affects Versions: Content Distribution Journal Core 0.1.2
            Reporter: Timothee Maret
            Assignee: Timothee Maret
             Fix For: Content Distribution Journal Core 0.1.4


The average load generated by JournalAvailabilityChecker multiplies quickly for 
multi tenant deployments. The checker can be configured (via Sling Scheduler 
{{scheduler.period}}) to reduce the polling frequency but doing so also reduces 
the sensibility to detect availability changes.

To improve the sensibility we should support an exponential backoff algorithm. 
The algorithm would divide the rate by two (up to a limit) every time the 
availability status does not change and reset the rate when the status changes. 
Steady states (available or unavailable) would eventually yield the least load. 
In the average case (availability status is steady) the load will be reduced up 
to the limit. In the worst case (availability changes all the time) the load 
will not be reduced compared to today. 

The base rate would be Sling Scheduler {{scheduler.period}}. The rate at time t 
+ 1 would be computed as follow: Rate~t+1~ = Multiplier~t+1~ * Rate~t+1~. The 
table below summarise how the multiplier would evolve according to the 
available status change. 
||State~t~||State~t+1~||Multiplier~t+1~||
|unavailable|unavailable|max(2 * Multiplier~t~, limit)|
|unavailable|available|1|
|available|unavailable|1|
|available|available|max(2 * Multiplier~t~, limit)|

The limit would be hardcoded to 16 which would reduce the load by an order of 
magnitude, we could expose the limit as a configuration later if needed.

There should be no need to randomise the multiplier for now as the checker are 
expected to be started at random time. If we hit a scenario where the checkers 
start at the same time, we could simply randomise the first scheduled event.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to