Stig Rohde Døssing created STORM-3321:
-----------------------------------------

             Summary: Tests are flaky due to long timeouts in Nimbus and 
supervisor when using LocalCluster
                 Key: STORM-3321
                 URL: https://issues.apache.org/jira/browse/STORM-3321
             Project: Apache Storm
          Issue Type: Bug
    Affects Versions: 2.0.0
            Reporter: Stig Rohde Døssing
            Assignee: Stig Rohde Døssing


Tests will sometimes fail with timeout when using e.g. Testing.completeTopology.

The issue is that the timeout is 10 seconds, and Nimbus and the supervisor both 
have timers that monitor for new deployments that are also set to 10 seconds. 
This causes tests to time out because a lot of the test time is wasted waiting 
for Nimbus/the supervisors to catch that the test topology is deployed.

We should reduce these timeouts to their minimums.

There is also a race in Nimbus that can cause test failures 
{quote}
2019-01-21 02:00:19.587 [main] WARN  org.apache.storm.daemon.nimbus.Nimbus - 
Topology submission exception. (topology 
name='topologytest-45f5ad59-ec16-45a4-ba4a-eea992411cc1')
java.lang.RuntimeException: not a leader, current leader is 
NimbusInfo{host='DESKTOP-AGC8TKM', port=6627, isLeader=true}
        at 
org.apache.storm.daemon.nimbus.Nimbus.assertIsLeader(Nimbus.java:1525) 
~[classes/:?]
        at 
org.apache.storm.daemon.nimbus.Nimbus.submitTopologyWithOpts(Nimbus.java:2982) 
~[classes/:?]
        at 
org.apache.storm.daemon.nimbus.Nimbus.submitTopology(Nimbus.java:2965) 
~[classes/:?]
        at org.apache.storm.LocalCluster.submitTopology(LocalCluster.java:444) 
~[classes/:?]
        at org.apache.storm.LocalCluster.submitTopology(LocalCluster.java:125) 
~[classes/:?]
        at org.apache.storm.Testing.completeTopology(Testing.java:424) 
~[classes/:?]
{quote}

The issue is that Nimbus has to acquire leadership in order to submit 
topologies, but LocalCluster doesn't wait for the Nimbus instance it creates to 
gain leadership.

We should make LocalCluster wait for Nimbus to gain leadership.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to