[
https://issues.apache.org/jira/browse/STORM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Li updated STORM-3767:
----------------------------
Fix Version/s: 2.3.0
> NPE on getComponentPendingProfileActions
> -----------------------------------------
>
> Key: STORM-3767
> URL: https://issues.apache.org/jira/browse/STORM-3767
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 2.0.0, 2.1.0, 2.2.0
> Reporter: Ethan Li
> Assignee: Ethan Li
> Priority: Major
> Fix For: 2.3.0
>
> Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> When a topology is newly submitted, if the scheduling loop takes too long,
> the component UI might have error 500.
> This is due to the NPE in nimbus code. An example:
> 1. When a scheduling loop finishes, nimbus will eventually update the
> assignmentsBackend. if a topology is newly submitted, its entry will be added
> to the idToAssignment map, otherwise, the entry will be updated with new
> assignments. The key point is the new topology Id doesn't exist in
> idToAssignment before it reaching here.
> https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64
> 2. However, this assignmentsBackend update only started to happen at
> 2021-04-23 15:30:14.299
> {code:java}
> 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment
> for topology
> {code}
> while this topology topo1-52-1619191499 has been scheduled at 2021-04-23
> 15:25:13.887. The scheduling loop took longer than 5mins.
> {code:java}
> 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS -
> topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy
> (1297 states traversed in 1275 ms, backtracked 0 times)
> other topologies were taking long time
> 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS -
> topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy
> (111 states traversed in 34 ms, backtracked 0 times)
> ...
> 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS -
> TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting
> lower priority topologies. Additional Memory Required: 20128.0 MB (Available:
> 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 %
> CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed
> in 299804 ms, backtracked 65555 times, 89 of 150 executors scheduled)
> ...
> 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS -
> evaluateplus-dev-47-1605825401 Running - Fully Scheduled by
> GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0
> times)
> {code}
> 3. During this period, the idToAssignment map in assignmentsBackend wouldn't
> have the entry for topo1-52-1619191499, so when a component UI was visited,
> https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614
> https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69
> it got a null value as the assignment, and hence NPE.
> This can be produced easily by adding some sleep anywhere between
> {code:title=Nimbus.java}
> Map<String, SchedulerAssignment> newSchedulerAssignments =
> computeNewSchedulerAssignments(existingAssignments,
> topologies, bases, scratchTopoId);
> {code}
> and
> {code:title=Nimbus.java}
> state.setAssignment(topoId, assignment, td.getConf());
> {code}
> and submit a new topology and visit its component UI
--
This message was sent by Atlassian Jira
(v8.3.4#803005)