Stefan Egli created SLING-5310:
----------------------------------
Summary: MinEventDelayHandler should have a cancel method
Key: SLING-5310
URL: https://issues.apache.org/jira/browse/SLING-5310
Project: Sling
Issue Type: Improvement
Components: Extensions
Affects Versions: Discovery Commons 1.0.4
Reporter: Stefan Egli
Assignee: Stefan Egli
Priority: Minor
Fix For: Discovery Commons 1.0.6
The {{ViewStateManagerImpl}} delegates the feature of delaying a
{{TOPOLOGY_CHANGED}} event a few seconds to avoid too frequent switching when
multiple instances come and go to the {{MinEventDelayHandler}}. When the
ViewStateManagerImpl is stopped however (via {{handleDeactivated}}), then this
is not noticed by the MinEventDelayHandler. With the result that it might
happily continue in the following loop: {{triggerAsyncDelaying}} schedules a
runnable to be triggered after 3 seconds by default. When that is triggered, it
checks the state of the view. If the view is not current (which is typically
the case after deactivation), then it reschedules itself - thinking that
eventually the view would become current/stable again. This is normally the
case and a good way to guarantee that eventually the view change can be
announced. However after deactivation this will likely not occur and thus the
MinEventDelayHandler would just spin happily onwards in this 3sec-loop forever,
or until the ViewStateManager is reactivated.
For normal operations this behavior is not a problem at all (thus priority
minor)
However, for testing this has the side-effect, that this loop will span into
subsequent tests - and potentially messing with it.
One way of 'messing' has been noticed in the following failing test on jenkins:
https://builds.apache.org/job/sling-trunk-1.7/org.apache.sling$org.apache.sling.discovery.impl/2751/testReport/org.apache.sling.discovery.impl.common.heartbeat/HeartbeatTest/testPartitioning/
{code}
java.lang.AssertionError: expected:<TOPOLOGY_INIT> but was:<TOPOLOGY_CHANGED>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at
org.apache.sling.discovery.impl.common.heartbeat.HeartbeatTest.doTestPartitioning(HeartbeatTest.java:285)
at
org.apache.sling.discovery.impl.common.heartbeat.HeartbeatTest.testPartitioning(HeartbeatTest.java:143)
{code}
where one 'issue heartbeat' operation triggered from {{doTestPartitioning}}
lasted over 5 seconds:
{code}
17.11.2015 23:50:37.033 *DEBUG* [main] DiscoveryServiceImpl: updateProperties:
done.
17.11.2015 23:50:37.033 *DEBUG* [main] HeartbeatHandler:
issueClusterLocalHeartbeat: storing cluster-local heartbeat to repository for
fe88cbb1-f967-48c5-a58d-30fd137909cc
17.11.2015 23:50:42.707 *DEBUG* [main] HeartbeatHandler: issueConnectorPings:
not issuing remote heartbeat yet, startup not yet finished
17.11.2015 23:50:42.724 *DEBUG* [main] fe88cbb1-f967-48c5-a58d-30fd137909cc:
analyzeVotings: start. slingId: fe88cbb1-f967-48c5-a58d-30fd137909cc
17.11.2015 23:50:43.081 *DEBUG* [main] VotingHelper: listVotings: votings
found: 0
17.11.2015 23:50:43.081 *DEBUG* [main] fe88cbb1-f967-48c5-a58d-30fd137909cc:
analyzeVotings: no ongoing votings at the moment. done.
17.11.2015 23:50:43.082 *DEBUG* [main] HeartbeatHandler: doCheckView:
established view matches with expected.
17.11.2015 23:50:43.082 *DEBUG* [main] HeartbeatHandler: doCheckViewWith: no
pending nor winning votes. view is fine. we're all happy.
{code}
and the only explanation found so far was that the thread-pool that should
normally process background jobs was busy with all those scheduled jobs that
were left over from previous jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)