[
https://issues.apache.org/jira/browse/STORM-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Li resolved STORM-3622.
-----------------------------
Fix Version/s: 2.2.0
Resolution: Fixed
Merged into master
> Race Condition in CachedThreadStatesGaugeSet registered at SystemBolt
> ---------------------------------------------------------------------
>
> Key: STORM-3622
> URL: https://issues.apache.org/jira/browse/STORM-3622
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 2.2.0
> Reporter: Ethan Li
> Assignee: Ethan Li
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.2.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> We noticed that with the change in https://github.com/apache/storm/pull/3242,
> there is a race condition causing NPE.
> {code:java}
> 2020-04-14 18:22:12.997 o.a.s.u.Utils Thread-17-__acker-executor[16, 16]
> [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.NullPointerException
> at org.apache.storm.executor.Executor.accept(Executor.java:291)
> ~[storm-client-2.2.0.y.jar:2.2.0.y]
> at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:131)
> ~[storm-client-2.2.0.y.jar:2.2.0.y]
> at org.apache.storm.utils.JCQueue.consume(JCQueue.java:111)
> ~[storm-client-2.2.0.y.jar:2.2.0.y]
> at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:172)
> ~[storm-client-2.2.0.y.jar:2.2.0.y]
> at org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:159)
> ~[storm-client-2.2.0.y.jar:2.2.0.y]
> at org.apache.storm.utils.Utils$1.run(Utils.java:434)
> [storm-client-2.2.0.y.jar:2.2.0.y]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
> Caused by: java.lang.NullPointerException
> at
> com.codahale.metrics.jvm.ThreadStatesGaugeSet.getThreadCount(ThreadStatesGaugeSet.java:95)
> ~[metrics-jvm-3.2.6.jar:3.2.6]
> at
> com.codahale.metrics.jvm.ThreadStatesGaugeSet.access$000(ThreadStatesGaugeSet.java:20)
> ~[metrics-jvm-3.2.6.jar:3.2.6]
> at
> com.codahale.metrics.jvm.ThreadStatesGaugeSet$1.getValue(ThreadStatesGaugeSet.java:56)
> ~[metrics-jvm-3.2.6.jar:3.2.6]
> at org.apache.storm.executor.Executor.addV2Metrics(Executor.java:344)
> ~[storm-client-2.2.0.y.jar:2.2.0.y]
> at org.apache.storm.executor.Executor.metricsTick(Executor.java:320)
> ~[storm-client-2.2.0.y.jar:2.2.0.y]
> at
> org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:218)
> ~[storm-client-2.2.0.y.jar:2.2.0.y]
> at org.apache.storm.executor.Executor.accept(Executor.java:287)
> ~[storm-client-2.2.0.y.jar:2.2.0.y]
> ... 6 more
> {code}
> This is due to a race condition in CachedGauge
> https://github.com/dropwizard/metrics/blob/v3.2.6/metrics-core/src/main/java/com/codahale/metrics/CachedGauge.java#L49-L53
> There are two issues here.
> The first one is STORM-3623.
> This makes all the executors to get values for all the metrics. So multiple
> threads will access the same metric.
> So the threads gauges are now accessed by multiple threads. But in
> CachedGauge,
> {code:java}
> @Override
> public T getValue() {
> if (shouldLoad()) {
> this.value = loadValue();
> }
> return value;
> }
> {code}
> this method is not thread-safe. Two threads can reach to getValue at the same
> time.
> The first thread reaching shouldLoad knows it needs to reload, so it calls
> the next line this.value=loadValue()
> The second thread is a little bit late so shouldLoad returns false. Then it
> returns the value directly.
> There is a race condition between first thread calling loadValue() and the
> second thread returning value.
> If the first thread finishes loadValue() first, both values returned to the
> threads are the same value (and current value). But if the second thread
> returns earlier, the second thread gets the original value (which is null ),
> hence NPE.
> To summarize, the second issue is CachedThreadStatesGaugeSet is not
> thread-safe
> To fix this NPE, we should avoid using CachedThreadStatesGaugeSet.
> But we still need to fix STORM-3623 to avoid unnecessary computations and
> redundant metrics.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)