[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085381#comment-14085381
 ] 

Alena Prokharchyk commented on CLOUDSTACK-7209:
-----------------------------------------------

The root cause for the bug is the commit below:

commit 47d6a64b319ab064c4b855346f2bfdb250fb9ad8
Author: Koushik Das <[email protected]>
Date:   Fri Jul 25 15:17:35 2014 +0530

    CLOUDSTACK-7182: NPE while trying to deploy VMs in parallel in isolated 
network
    The following changes are made:
    - Check to see if network is implemented changed from 'state == 
Implementing||Implemented' to 'state == Implemented'.
    The earlier check was a hack to prevent the issue described below.
    - At the time of implementing network (using implementNetwork() method), if 
the VR needs to be deployed then
    it follows the same path of regular VM deployment. This leads to a nested 
call to implementNetwork() while
    preparing VR nics. This flow creates issues in dealing with network state 
transitions. The original call
    puts network in "Implementing" state and then the nested call again tries 
to put it into same state resulting
    in issues. In order to avoid it, implementNetwork() call for VR is replaced 
with below code.


Happened in VPC environment. Network implement was called on the same network 
twice in the following scenario:

* Create VPC
* Add network to the VPC
* deploy vm in the network. 

First user vm tried to implement the network as a part of its start. It put 
network to Implementing state:

2014-07-27 20:16:24,325 DEBUG [o.a.c.e.o.NetworkOrchestrator] 
(Work-Job-Executor-72:ctx-36d7a7bb job-212/job-214 ctx-c8f6b988) Lock is 
acquired for network id 222 as a part of network implement
2014-07-27 20:16:24,325 DEBUG [o.a.c.e.o.NetworkOrchestrator] 
(Work-Job-Executor-72:ctx-36d7a7bb job-212/job-214 ctx-c8f6b988) Asking 
ExternalGuestNetworkGuru to implement Ntwk[222|Guest|11]

Then the same thread tries to implement the network all over again as a part of 
the nic plug to the VPC VR. The lock in this case is granted right away because 
we acquire it as a part of the same thread:

2014-07-27 20:16:24,592 DEBUG [o.a.c.e.o.NetworkOrchestrator] 
(Work-Job-Executor-72:ctx-36d7a7bb job-212/job-214 ctx-c8f6b988) Lock is 
acquired for network id 222 as a part of network implement
2014-07-27 20:16:24,592 DEBUG [o.a.c.e.o.NetworkOrchestrator] 
(Work-Job-Executor-72:ctx-36d7a7bb job-212/job-214 ctx-c8f6b988) Asking 
ExternalGuestNetworkGuru to implement Ntwk[222|Guest|11]
2014-07-27 20:16:24,594 ERROR [o.a.c.e.o.NetworkOrchestrator] 
(Work-Job-Executor-72:ctx-36d7a7bb job-212/job-214 ctx-c8f6b988) Unable to 
transition to a new state from Implementing via ImplementNetwork

And then we fail to transition from Implementing to Implementing state.

The bug was fixed when the commit I've mentioned above, was reverted. It 
happened with commit: 

commit f47cfc6eb16bf0fa5830327207a2d3fdf24ab700
Author: Sheng Yang <[email protected]>
Date:   Mon Jul 28 15:47:44 2014 -0700

    CLOUDSTACK-7186: Revert "CLOUDSTACK-7182: NPE while trying to deploy VMs in 
parallel in isolated network"
    
    This reverts commit 47d6a64b319ab064c4b855346f2bfdb250fb9ad8, which broke 
VPC
    completely.

> [Automation] NPE observed in the CI Simulator Run on master
> -----------------------------------------------------------
>
>                 Key: CLOUDSTACK-7209
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7209
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>    Affects Versions: 4.5.0
>         Environment: CI
>            Reporter: Raja Pullela
>            Assignee: Alena Prokharchyk
>            Priority: Critical
>             Fix For: 4.5.0
>
>         Attachments: vmops.log
>
>
> Following NPE is observed on the CI Env:
> 02:17:16,156 DEBUG [o.a.c.e.o.NetworkOrchestrator] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Lock is 
> released for network Ntwk[222|Guest|11] as a part of network shutdown
> 2014-07-28 02:17:16,156 DEBUG [o.a.c.e.o.NetworkOrchestrator] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Lock is 
> released for network id 222 as a part of network implement
> 2014-07-28 02:17:16,158 WARN  [c.c.n.r.VpcVirtualNetworkApplianceManagerImpl] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Failed to 
> add router VM[DomainRouter|r-38-VM] to network Ntwk[222|Guest|11] due to 
> java.lang.NullPointerException
>       at 
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.createNicForVm(NetworkOrchestrator.java:3077)
>       at 
> com.cloud.vm.VirtualMachineManagerImpl.orchestrateAddVmToNetwork(VirtualMachineManagerImpl.java:3409)
>       at 
> com.cloud.vm.VirtualMachineManagerImpl.addVmToNetwork(VirtualMachineManagerImpl.java:3355)
>       at 
> com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl.addVpcRouterToGuestNetwork(VpcVirtualNetworkApplianceManagerImpl.java:267)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
>       at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
>       at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
>       at 
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
>       at 
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
>       at 
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
>       at com.sun.proxy.$Proxy191.addVpcRouterToGuestNetwork(Unknown Source)
>       at 
> com.cloud.network.element.VpcVirtualRouterElement.implement(VpcVirtualRouterElement.java:187)
>       at 
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.implementNetworkElementsAndResources(NetworkOrchestrator.java:1088)
>       at 
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.implementNetwork(NetworkOrchestrator.java:995)
>       at 
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.prepare(NetworkOrchestrator.java:1282)
>       at 
> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:985)
>       at 
> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:5146)
>       at sun.reflect.GeneratedMethodAccessor358.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
>       at 
> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:5302)
>       at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
>       at 
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503)
>       at 
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>       at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>       at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>       at 
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>       at 
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>       at 
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:744)
> 2014-07-28 02:17:16,160 DEBUG [c.c.n.r.VpcVirtualNetworkApplianceManagerImpl] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Removing the 
> router VM[DomainRouter|r-38-VM] from network Ntwk[222|Guest|11] as a part of 
> cleanup
> 2014-07-28 02:17:16,161 DEBUG [c.c.n.r.VpcVirtualNetworkApplianceManagerImpl] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Router 
> VM[DomainRouter|r-38-VM] is not a part of the Guest network Ntwk[222|Guest|11]
> 2014-07-28 02:17:16,161 DEBUG [c.c.n.r.VpcVirtualNetworkApplianceManagerImpl] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Removed the 
> router VM[DomainRouter|r-38-VM] from network Ntwk[222|Guest|11] as a part of 
> cleanup
> 2014-07-28 02:17:16,161 DEBUG [o.a.c.e.o.NetworkOrchestrator] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Cleaning up 
> because we're unable to implement the network Ntwk[222|Guest|11]
> 2014-07-28 02:17:16,169 DEBUG [o.a.c.e.o.NetworkOrchestrator] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Lock is 
> acquired for network Ntwk[222|Guest|11] as a part of network shutdown
> 2014-07-28 02:17:16,174 DEBUG [o.a.c.e.o.NetworkOrchestrator] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Releasing 0 
> port forwarding rules for network id=222 as a part of shutdownNetworkRules
> 2014-07-28 02:17:16,174 DEBUG [c.c.n.f.FirewallManagerImpl] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) There are no 
> rules to forward to the network elements
> 2014-07-28 02:17:16,175 DEBUG [o.a.c.e.o.NetworkOrchestrator] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Releasing 0 
> static nat rules for network id=222 as a part of shutdownNetworkRules
> 2014-07-28 02:17:16,175 DEBUG [c.c.n.f.FirewallManagerImpl] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) There are no 
> rules to forward to the network elements
> 2014-07-28 02:17:16,175 DEBUG [c.c.n.l.LoadBalancingRulesManagerImpl] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Revoking 0 
> Public load balancing rules for network id=222
> 2014-07-28 02:17:16,176 DEBUG [c.c.n.l.LoadBalancingRulesManagerImpl] 
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) There are no 
> Load Balancing Rules to forward to the network elements



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to