[
https://issues.apache.org/jira/browse/SLING-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197535#comment-15197535
]
Stefan Egli commented on SLING-5560:
------------------------------------
[~chetanm] re
bq. 30 sec default (as was the case earlier)
I can't find this anywhere in the code that we 'earlier' had a default delay
(be that 30sec or 10sec or otherwise). I've checked the
{{JobManagerConfiguration}} back to sling.event 3.4.0 and it always does a
{{CheckTopologyTask.fullRun}} immediately on receiving the {{TOPOLOGY_INIT}}
Suggestion on how to implement this:
* simply by single config
** have a new parameter 'startupDelay' of say 30 or 60sec by default (I think
we shouldn't re/ab-use the existing 'backgroundLoadDelay' as a) that is
orthogonal and b) it additionally applies and c) it also applies for
{{TOPOLOGY_CHANGED}}, not only for startup)
** any topology event that is received *before* the startupDelay has passed is
'queued'
** once the startupDelay has passed, those queued topology events are
processed. This might not be as simple as just callling the existing
{{handleTopologyEvent}} method in a loop, as it likely contains {{!current}}
views. So perhaps the logic must be slightly modified there, not sure.
* by automatism and a config (as already suggested above, here with more
details):
** upon an actual topology change, store a 'copy' of the clusterView (ie all
slingIds) {{/var/eventing/clusterInstances/<mySlingId>}}
** upon {{TOPOLOGY_INIT}} compare the then-current clusterView with what's
stoerd under {{/var/eventing/clusterInstances/<mySlingId>}}
*** if it matches go ahead
*** if it doesn't match, wait a max 'maxStartupDelay' and then still go ahead
even though that then means to do a reassignment (bite the bullet)
[~cziegeler], wdyt? should we go for the simple or the automatism approach?
> Delay job processing at startup to avoid unnecessary stale job handling
> -----------------------------------------------------------------------
>
> Key: SLING-5560
> URL: https://issues.apache.org/jira/browse/SLING-5560
> Project: Sling
> Issue Type: Improvement
> Components: Extensions
> Reporter: Chetan Mehrotra
> Assignee: Stefan Egli
> Fix For: Event 4.1.0
>
>
> While running in a cluster (or in some case non cluster setup also) Topology
> would become stable after "some" time. For e.g. in a 2 node setup by the time
> first node comes up second node might not have started so topology would not
> detect it and first node might think that second node is not there and it can
> then start assigning job for that node to current node under stable job
> processing.
> Instead of doing this just right at startup job processing should start after
> "some" delay such that topology becomes stable. This would avoid this
> unnecessary work and probably even reduce load on the master
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)