Stefan Egli created SLING-5310:
----------------------------------

             Summary: MinEventDelayHandler should have a cancel method
                 Key: SLING-5310
                 URL: https://issues.apache.org/jira/browse/SLING-5310
             Project: Sling
          Issue Type: Improvement
          Components: Extensions
    Affects Versions: Discovery Commons 1.0.4
            Reporter: Stefan Egli
            Assignee: Stefan Egli
            Priority: Minor
             Fix For: Discovery Commons 1.0.6


The {{ViewStateManagerImpl}} delegates the feature of delaying a 
{{TOPOLOGY_CHANGED}} event a few seconds to avoid too frequent switching when 
multiple instances come and go to the {{MinEventDelayHandler}}. When the 
ViewStateManagerImpl is stopped however (via {{handleDeactivated}}), then this 
is not noticed by the MinEventDelayHandler. With the result that it might 
happily continue in the following loop: {{triggerAsyncDelaying}} schedules a 
runnable to be triggered after 3 seconds by default. When that is triggered, it 
checks the state of the view. If the view is not current (which is typically 
the case after deactivation), then it reschedules itself - thinking that 
eventually the view would become current/stable again. This is normally the 
case and a good way to guarantee that eventually the view change can be 
announced. However after deactivation this will likely not occur and thus the 
MinEventDelayHandler would just spin happily onwards in this 3sec-loop forever, 
or until the ViewStateManager is reactivated.

For normal operations this behavior is not a problem at all (thus priority 
minor)

However, for testing this has the side-effect, that this loop will span into 
subsequent tests - and potentially messing with it. 

One way of 'messing' has been noticed in the following failing test on jenkins:
https://builds.apache.org/job/sling-trunk-1.7/org.apache.sling$org.apache.sling.discovery.impl/2751/testReport/org.apache.sling.discovery.impl.common.heartbeat/HeartbeatTest/testPartitioning/
{code}
java.lang.AssertionError: expected:<TOPOLOGY_INIT> but was:<TOPOLOGY_CHANGED>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:144)
        at 
org.apache.sling.discovery.impl.common.heartbeat.HeartbeatTest.doTestPartitioning(HeartbeatTest.java:285)
        at 
org.apache.sling.discovery.impl.common.heartbeat.HeartbeatTest.testPartitioning(HeartbeatTest.java:143)
{code}

where one 'issue heartbeat' operation triggered from {{doTestPartitioning}} 
lasted over 5 seconds:
{code}
17.11.2015 23:50:37.033 *DEBUG* [main] DiscoveryServiceImpl: updateProperties: 
done.
17.11.2015 23:50:37.033 *DEBUG* [main] HeartbeatHandler: 
issueClusterLocalHeartbeat: storing cluster-local heartbeat to repository for 
fe88cbb1-f967-48c5-a58d-30fd137909cc
17.11.2015 23:50:42.707 *DEBUG* [main] HeartbeatHandler: issueConnectorPings: 
not issuing remote heartbeat yet, startup not yet finished
17.11.2015 23:50:42.724 *DEBUG* [main] fe88cbb1-f967-48c5-a58d-30fd137909cc: 
analyzeVotings: start. slingId: fe88cbb1-f967-48c5-a58d-30fd137909cc
17.11.2015 23:50:43.081 *DEBUG* [main] VotingHelper: listVotings: votings 
found: 0
17.11.2015 23:50:43.081 *DEBUG* [main] fe88cbb1-f967-48c5-a58d-30fd137909cc: 
analyzeVotings: no ongoing votings at the moment. done.
17.11.2015 23:50:43.082 *DEBUG* [main] HeartbeatHandler: doCheckView: 
established view matches with expected.
17.11.2015 23:50:43.082 *DEBUG* [main] HeartbeatHandler: doCheckViewWith: no 
pending nor winning votes. view is fine. we're all happy.
{code}

and the only explanation found so far was that the thread-pool that should 
normally process background jobs was busy with all those scheduled jobs that 
were left over from previous jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to