[
https://issues.apache.org/jira/browse/STORM-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jungtaek Lim resolved STORM-3121.
---------------------------------
Resolution: Fixed
Fix Version/s: 2.0.0
Thanks [~Srdo], I merged into master.
> Fix flaky metrics tests in storm-core
> -------------------------------------
>
> Key: STORM-3121
> URL: https://issues.apache.org/jira/browse/STORM-3121
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 2.0.0
> Reporter: Stig Rohde Døssing
> Assignee: Stig Rohde Døssing
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.0.0
>
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> The tests are flaky, but only rarely fail. I've only seen them fail on Travis
> when Travis is under load.
> Example failures:
> {code}
> classname: org.apache.storm.metrics-test / testname:
> test-custom-metric-with-multi-tasks
> expected: (clojure.core/= [1 0 0 0 0 0 2] (clojure.core/subvec
> (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "2"
> "my-custom-metric") 0 N__3207__auto__))
> actual: (not (clojure.core/= [1 0 0 0 0 0 2] [1 0 0 0 0 0 0]))
> at: test_runner.clj:105
> {code}
> {code}
> classname: org.apache.storm.metrics-test / testname: test-builtin-metrics-2
> expected: (clojure.core/= [1 1] (clojure.core/subvec
> (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name!
> "myspout" "__emit-count/default") 0 N__3207__auto__))
> actual: (not (clojure.core/= [1 1] [1 0]))
> at: test_runner.clj:105
> {code}
> The problem is that the tests increment metrics counters in the executor
> async loops, then expect the counters to end up in exact metrics buckets. The
> creation of a bucket is triggered by the metrics timer. The timer is included
> in time simulation and LocalCluster.waitForIdle, but the executor async loop
> isn't. There isn't any guarantee that the executor async loop gets to run
> when the test does a sequence like
> {code}
> Time.advanceClusterTime
> cluster.waitForIdle
> {code}
> because the waitForIdle check doesn't know about the executor async loop.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)