GitHub user srdo opened a pull request:
https://github.com/apache/storm/pull/2933
STORM-3309: Fix flaky tick tuple test
https://issues.apache.org/jira/browse/STORM-3309
I've made the following changes:
* When message timeout is disabled, the acker shouldn't time out tuples.
Disable ticks for the acker if message timeouts are disabled
* The spout and bolt executors don't integrate with time simulation, in the
sense that they don't require simulated time to increment in order to run. This
is fine, but if they aren't going to pause for simulated time to increase, they
also shouldn't potentially pause during initialization, waiting for Nimbus to
activate the topology.
* InProcMessaging (used by the FeederSpout) will wait for the receiver to
show up when sending the initial message. It waits at most 20 seconds, but if
time simulation is enabled, it only waits 2. This is not enough for the
topology/spout to start most of the time. I set the simulated time increment to
match the real time spent waiting.
* The Zookeeper log drowns out any useful logging, set its level to WARN in
storm-server
The TickTupleTest has been amended a bit. The problem with the current code
is that LocalCluster.waitForIdle doesn't cover spout and bolt executor async
loops, so we can end up in a situations where the test fails spuriously.
Example:
The test starts by incrementing cluster time until the bolt receives a tick
tuple. Starting from t=0, it is possible that the test sets cluster time to 10
and waits until the tick thread has added some tuples. The bolt thread runs
independently of time simulation, and will consume the first tick at some
arbitrary time. If we are unlucky, we can get the following sequence:
* 10 ticks are added by tick thread
* Bolt consumes first tick
* All threads covered by LocalCluster.waitForIdle (but not the bolt thread)
are now idle, so the test exits the loop waiting for ticks
* The received ticks list is cleared
* The test stores what time the list was cleared at, advances cluster time
by 1 and checks that a tick is received
* The bolt may just now be processing some of the previously queued ticks.
This will cause the test to fail, because the bolt may receive multiple ticks
at the same simulated time.
The replacement test instead uses a bootstrap tuple to verify that the
executor (and tick thread) have started, and then increments the full tick
interval. The tick interval is chosen so the tick thread will not produce any
ticks until the test advances time enough to trigger one. This allows the test
to verify that exactly one tick is received per second.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/srdo/storm STORM-3309
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/2933.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2933
----
commit 6ca657d13d7f0ec50be2baed7fd8c70df5c9deca
Author: Stig Rohde Døssing <srdo@...>
Date: 2019-01-05T13:38:04Z
STORM-3309: Fix flaky tick tuple test
----
---