[ 
https://issues.apache.org/jira/browse/SLING-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197535#comment-15197535
 ] 

Stefan Egli commented on SLING-5560:
------------------------------------

[~chetanm] re
bq. 30 sec default (as was the case earlier)
I can't find this anywhere in the code that we 'earlier' had a default delay 
(be that 30sec or 10sec or otherwise). I've checked the 
{{JobManagerConfiguration}} back to sling.event 3.4.0 and it always does a 
{{CheckTopologyTask.fullRun}} immediately on receiving the {{TOPOLOGY_INIT}}

Suggestion on how to implement this:
* simply by single config
** have a new parameter 'startupDelay' of say 30 or 60sec by default (I think 
we shouldn't re/ab-use the existing 'backgroundLoadDelay' as a) that is 
orthogonal and b) it additionally applies and c) it also applies for 
{{TOPOLOGY_CHANGED}}, not only for startup)
** any topology event that is received *before* the startupDelay has passed is 
'queued'
** once the startupDelay has passed, those queued topology events are 
processed. This might not be as simple as just callling the existing 
{{handleTopologyEvent}} method in a loop, as it likely contains {{!current}} 
views. So perhaps the logic must be slightly modified there, not sure.
* by automatism and a config (as already suggested above, here with more 
details):
** upon an actual topology change, store a 'copy' of the clusterView (ie all 
slingIds) {{/var/eventing/clusterInstances/<mySlingId>}}
** upon {{TOPOLOGY_INIT}} compare the then-current clusterView with what's 
stoerd under {{/var/eventing/clusterInstances/<mySlingId>}}
*** if it matches go ahead
*** if it doesn't match, wait a max 'maxStartupDelay' and then still go ahead 
even though that then means to do a reassignment (bite the bullet)

[~cziegeler], wdyt? should we go for the simple or the automatism approach?

> Delay job processing at startup to avoid unnecessary stale job handling
> -----------------------------------------------------------------------
>
>                 Key: SLING-5560
>                 URL: https://issues.apache.org/jira/browse/SLING-5560
>             Project: Sling
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Chetan Mehrotra
>            Assignee: Stefan Egli
>             Fix For: Event 4.1.0
>
>
> While running in a cluster (or in some case non cluster setup also) Topology 
> would become stable after "some" time. For e.g. in a 2 node setup by the time 
> first node comes up second node might not have started so topology would not 
> detect it and first node might think that second node is not there and it can 
> then start assigning job for that node to current node under stable job 
> processing.
> Instead of doing this just right at startup job processing should start after 
> "some" delay such that topology becomes stable. This would avoid this 
> unnecessary work and probably even reduce load on the master



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to