[ 
https://issues.apache.org/jira/browse/STORM-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved STORM-3121.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

Thanks [~Srdo], I merged into master.

> Fix flaky metrics tests in storm-core
> -------------------------------------
>
>                 Key: STORM-3121
>                 URL: https://issues.apache.org/jira/browse/STORM-3121
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 2.0.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The tests are flaky, but only rarely fail. I've only seen them fail on Travis 
> when Travis is under load.
> Example failures:
> {code}
> classname: org.apache.storm.metrics-test / testname: 
> test-custom-metric-with-multi-tasks
> expected: (clojure.core/= [1 0 0 0 0 0 2] (clojure.core/subvec 
> (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! "2" 
> "my-custom-metric") 0 N__3207__auto__))
>   actual: (not (clojure.core/= [1 0 0 0 0 0 2] [1 0 0 0 0 0 0]))
>       at: test_runner.clj:105
> {code}
> {code}
> classname: org.apache.storm.metrics-test / testname: test-builtin-metrics-2
> expected: (clojure.core/= [1 1] (clojure.core/subvec 
> (org.apache.storm.metrics-test/lookup-bucket-by-comp-id-&-metric-name! 
> "myspout" "__emit-count/default") 0 N__3207__auto__))
>   actual: (not (clojure.core/= [1 1] [1 0]))
>       at: test_runner.clj:105
> {code}
> The problem is that the tests increment metrics counters in the executor 
> async loops, then expect the counters to end up in exact metrics buckets. The 
> creation of a bucket is triggered by the metrics timer. The timer is included 
> in time simulation and LocalCluster.waitForIdle, but the executor async loop 
> isn't. There isn't any guarantee that the executor async loop gets to run 
> when the test does a sequence like
> {code}
> Time.advanceClusterTime
> cluster.waitForIdle
> {code}
> because the waitForIdle check doesn't know about the executor async loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to