[ 
https://issues.apache.org/jira/browse/SLING-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli resolved SLING-5310.
--------------------------------
    Resolution: Fixed

fixed, including a new test: rev 1714984

> MinEventDelayHandler should have a cancel method
> ------------------------------------------------
>
>                 Key: SLING-5310
>                 URL: https://issues.apache.org/jira/browse/SLING-5310
>             Project: Sling
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: Discovery Commons 1.0.4
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Minor
>             Fix For: Discovery Commons 1.0.6
>
>
> The {{ViewStateManagerImpl}} delegates the feature of delaying a 
> {{TOPOLOGY_CHANGED}} event a few seconds to avoid too frequent switching when 
> multiple instances come and go to the {{MinEventDelayHandler}}. When the 
> ViewStateManagerImpl is stopped however (via {{handleDeactivated}}), then 
> this is not noticed by the MinEventDelayHandler. With the result that it 
> might happily continue in the following loop: {{triggerAsyncDelaying}} 
> schedules a runnable to be triggered after 3 seconds by default. When that is 
> triggered, it checks the state of the view. If the view is not current (which 
> is typically the case after deactivation), then it reschedules itself - 
> thinking that eventually the view would become current/stable again. This is 
> normally the case and a good way to guarantee that eventually the view change 
> can be announced. However after deactivation this will likely not occur and 
> thus the MinEventDelayHandler would just spin happily onwards in this 
> 3sec-loop forever, or until the ViewStateManager is reactivated.
> For normal operations this behavior is not a problem at all (thus priority 
> minor)
> However, for testing this has the side-effect, that this loop will span into 
> subsequent tests - and potentially messing with it. 
> One way of 'messing' has been noticed in the following failing test on 
> jenkins:
> https://builds.apache.org/job/sling-trunk-1.7/org.apache.sling$org.apache.sling.discovery.impl/2751/testReport/org.apache.sling.discovery.impl.common.heartbeat/HeartbeatTest/testPartitioning/
> {code}
> java.lang.AssertionError: expected:<TOPOLOGY_INIT> but was:<TOPOLOGY_CHANGED>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:144)
>       at 
> org.apache.sling.discovery.impl.common.heartbeat.HeartbeatTest.doTestPartitioning(HeartbeatTest.java:285)
>       at 
> org.apache.sling.discovery.impl.common.heartbeat.HeartbeatTest.testPartitioning(HeartbeatTest.java:143)
> {code}
> where one 'issue heartbeat' operation triggered from {{doTestPartitioning}} 
> lasted over 5 seconds:
> {code}
> 17.11.2015 23:50:37.033 *DEBUG* [main] DiscoveryServiceImpl: 
> updateProperties: done.
> 17.11.2015 23:50:37.033 *DEBUG* [main] HeartbeatHandler: 
> issueClusterLocalHeartbeat: storing cluster-local heartbeat to repository for 
> fe88cbb1-f967-48c5-a58d-30fd137909cc
> 17.11.2015 23:50:42.707 *DEBUG* [main] HeartbeatHandler: issueConnectorPings: 
> not issuing remote heartbeat yet, startup not yet finished
> 17.11.2015 23:50:42.724 *DEBUG* [main] fe88cbb1-f967-48c5-a58d-30fd137909cc: 
> analyzeVotings: start. slingId: fe88cbb1-f967-48c5-a58d-30fd137909cc
> 17.11.2015 23:50:43.081 *DEBUG* [main] VotingHelper: listVotings: votings 
> found: 0
> 17.11.2015 23:50:43.081 *DEBUG* [main] fe88cbb1-f967-48c5-a58d-30fd137909cc: 
> analyzeVotings: no ongoing votings at the moment. done.
> 17.11.2015 23:50:43.082 *DEBUG* [main] HeartbeatHandler: doCheckView: 
> established view matches with expected.
> 17.11.2015 23:50:43.082 *DEBUG* [main] HeartbeatHandler: doCheckViewWith: no 
> pending nor winning votes. view is fine. we're all happy.
> {code}
> and the only explanation found so far was that the thread-pool that should 
> normally process background jobs was busy with all those scheduled jobs that 
> were left over from previous jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to