[ 
https://issues.apache.org/jira/browse/STORM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Li updated STORM-3767:
----------------------------
    Fix Version/s: 2.3.0

> NPE on getComponentPendingProfileActions 
> -----------------------------------------
>
>                 Key: STORM-3767
>                 URL: https://issues.apache.org/jira/browse/STORM-3767
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.1.0, 2.2.0
>            Reporter: Ethan Li
>            Assignee: Ethan Li
>            Priority: Major
>             Fix For: 2.3.0
>
>         Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a topology is newly submitted, if the scheduling loop takes too long, 
> the component UI might have error 500.
> This is due to the NPE in nimbus code. An example:
> 1. When a scheduling loop finishes, nimbus will eventually update the 
> assignmentsBackend. if a topology is newly submitted, its entry will be added 
> to the idToAssignment map, otherwise, the entry will be updated with new 
> assignments. The key point is the new topology Id doesn't exist in 
> idToAssignment before it reaching here.
> https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64
> 2. However, this assignmentsBackend update only started to happen at 
> 2021-04-23 15:30:14.299
> {code:java}
> 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment 
> for topology
> {code}
> while this topology topo1-52-1619191499 has been scheduled at 2021-04-23 
> 15:25:13.887. The scheduling loop took longer than 5mins.
> {code:java}
> 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS - 
> topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy 
> (1297 states traversed in 1275 ms, backtracked 0 times)
> other topologies were taking long time
> 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS - 
> topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy 
> (111 states traversed in 34 ms, backtracked 0 times)
> ...
> 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS - 
> TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting 
> lower priority topologies. Additional Memory Required: 20128.0 MB (Available: 
> 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 % 
> CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed 
> in 299804 ms, backtracked 65555 times, 89 of 150 executors scheduled)
> ...
> 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS - 
> evaluateplus-dev-47-1605825401 Running - Fully Scheduled by 
> GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0 
> times)
> {code}
> 3. During this period, the idToAssignment map in assignmentsBackend wouldn't 
> have the entry for topo1-52-1619191499, so when a component UI was visited,
> https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614
> https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194
> https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69
> it got a null value as the assignment, and hence NPE.
> This can be produced easily by adding some sleep anywhere between 
> {code:title=Nimbus.java}
>             Map<String, SchedulerAssignment> newSchedulerAssignments =
>                     computeNewSchedulerAssignments(existingAssignments, 
> topologies, bases, scratchTopoId);
> {code}
> and
> {code:title=Nimbus.java}
>  state.setAssignment(topoId, assignment, td.getConf());
> {code}
> and submit a new topology and visit its component UI 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to