GitHub user srdo opened a pull request:

    https://github.com/apache/storm/pull/2933

    STORM-3309: Fix flaky tick tuple test

    https://issues.apache.org/jira/browse/STORM-3309
    
    I've made the following changes:
    * When message timeout is disabled, the acker shouldn't time out tuples. 
Disable ticks for the acker if message timeouts are disabled
    * The spout and bolt executors don't integrate with time simulation, in the 
sense that they don't require simulated time to increment in order to run. This 
is fine, but if they aren't going to pause for simulated time to increase, they 
also shouldn't potentially pause during initialization, waiting for Nimbus to 
activate the topology.
    * InProcMessaging (used by the FeederSpout) will wait for the receiver to 
show up when sending the initial message. It waits at most 20 seconds, but if 
time simulation is enabled, it only waits 2. This is not enough for the 
topology/spout to start most of the time. I set the simulated time increment to 
match the real time spent waiting.
    * The Zookeeper log drowns out any useful logging, set its level to WARN in 
storm-server
    
    The TickTupleTest has been amended a bit. The problem with the current code 
is that LocalCluster.waitForIdle doesn't cover spout and bolt executor async 
loops, so we can end up in a situations where the test fails spuriously.
    
    Example:
    The test starts by incrementing cluster time until the bolt receives a tick 
tuple. Starting from t=0, it is possible that the test sets cluster time to 10 
and waits until the tick thread has added some tuples. The bolt thread runs 
independently of time simulation, and will consume the first tick at some 
arbitrary time. If we are unlucky, we can get the following sequence:
    
    * 10 ticks are added by tick thread
    * Bolt consumes first tick
    * All threads covered by LocalCluster.waitForIdle (but not the bolt thread) 
are now idle, so the test exits the loop waiting for ticks
    * The received ticks list is cleared
    * The test stores what time the list was cleared at, advances cluster time 
by 1 and checks that a tick is received
    * The bolt may just now be processing some of the previously queued ticks. 
This will cause the test to fail, because the bolt may receive multiple ticks 
at the same simulated time.
    
    The replacement test instead uses a bootstrap tuple to verify that the 
executor (and tick thread) have started, and then increments the full tick 
interval. The tick interval is chosen so the tick thread will not produce any 
ticks until the test advances time enough to trigger one. This allows the test 
to verify that exactly one tick is received per second.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srdo/storm STORM-3309

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2933.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2933
    
----
commit 6ca657d13d7f0ec50be2baed7fd8c70df5c9deca
Author: Stig Rohde Døssing <srdo@...>
Date:   2019-01-05T13:38:04Z

    STORM-3309: Fix flaky tick tuple test

----


---

Reply via email to