[
https://issues.apache.org/jira/browse/CLOUDSTACK-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085381#comment-14085381
]
Alena Prokharchyk commented on CLOUDSTACK-7209:
-----------------------------------------------
The root cause for the bug is the commit below:
commit 47d6a64b319ab064c4b855346f2bfdb250fb9ad8
Author: Koushik Das <[email protected]>
Date: Fri Jul 25 15:17:35 2014 +0530
CLOUDSTACK-7182: NPE while trying to deploy VMs in parallel in isolated
network
The following changes are made:
- Check to see if network is implemented changed from 'state ==
Implementing||Implemented' to 'state == Implemented'.
The earlier check was a hack to prevent the issue described below.
- At the time of implementing network (using implementNetwork() method), if
the VR needs to be deployed then
it follows the same path of regular VM deployment. This leads to a nested
call to implementNetwork() while
preparing VR nics. This flow creates issues in dealing with network state
transitions. The original call
puts network in "Implementing" state and then the nested call again tries
to put it into same state resulting
in issues. In order to avoid it, implementNetwork() call for VR is replaced
with below code.
Happened in VPC environment. Network implement was called on the same network
twice in the following scenario:
* Create VPC
* Add network to the VPC
* deploy vm in the network.
First user vm tried to implement the network as a part of its start. It put
network to Implementing state:
2014-07-27 20:16:24,325 DEBUG [o.a.c.e.o.NetworkOrchestrator]
(Work-Job-Executor-72:ctx-36d7a7bb job-212/job-214 ctx-c8f6b988) Lock is
acquired for network id 222 as a part of network implement
2014-07-27 20:16:24,325 DEBUG [o.a.c.e.o.NetworkOrchestrator]
(Work-Job-Executor-72:ctx-36d7a7bb job-212/job-214 ctx-c8f6b988) Asking
ExternalGuestNetworkGuru to implement Ntwk[222|Guest|11]
Then the same thread tries to implement the network all over again as a part of
the nic plug to the VPC VR. The lock in this case is granted right away because
we acquire it as a part of the same thread:
2014-07-27 20:16:24,592 DEBUG [o.a.c.e.o.NetworkOrchestrator]
(Work-Job-Executor-72:ctx-36d7a7bb job-212/job-214 ctx-c8f6b988) Lock is
acquired for network id 222 as a part of network implement
2014-07-27 20:16:24,592 DEBUG [o.a.c.e.o.NetworkOrchestrator]
(Work-Job-Executor-72:ctx-36d7a7bb job-212/job-214 ctx-c8f6b988) Asking
ExternalGuestNetworkGuru to implement Ntwk[222|Guest|11]
2014-07-27 20:16:24,594 ERROR [o.a.c.e.o.NetworkOrchestrator]
(Work-Job-Executor-72:ctx-36d7a7bb job-212/job-214 ctx-c8f6b988) Unable to
transition to a new state from Implementing via ImplementNetwork
And then we fail to transition from Implementing to Implementing state.
The bug was fixed when the commit I've mentioned above, was reverted. It
happened with commit:
commit f47cfc6eb16bf0fa5830327207a2d3fdf24ab700
Author: Sheng Yang <[email protected]>
Date: Mon Jul 28 15:47:44 2014 -0700
CLOUDSTACK-7186: Revert "CLOUDSTACK-7182: NPE while trying to deploy VMs in
parallel in isolated network"
This reverts commit 47d6a64b319ab064c4b855346f2bfdb250fb9ad8, which broke
VPC
completely.
> [Automation] NPE observed in the CI Simulator Run on master
> -----------------------------------------------------------
>
> Key: CLOUDSTACK-7209
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7209
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Affects Versions: 4.5.0
> Environment: CI
> Reporter: Raja Pullela
> Assignee: Alena Prokharchyk
> Priority: Critical
> Fix For: 4.5.0
>
> Attachments: vmops.log
>
>
> Following NPE is observed on the CI Env:
> 02:17:16,156 DEBUG [o.a.c.e.o.NetworkOrchestrator]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Lock is
> released for network Ntwk[222|Guest|11] as a part of network shutdown
> 2014-07-28 02:17:16,156 DEBUG [o.a.c.e.o.NetworkOrchestrator]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Lock is
> released for network id 222 as a part of network implement
> 2014-07-28 02:17:16,158 WARN [c.c.n.r.VpcVirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Failed to
> add router VM[DomainRouter|r-38-VM] to network Ntwk[222|Guest|11] due to
> java.lang.NullPointerException
> at
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.createNicForVm(NetworkOrchestrator.java:3077)
> at
> com.cloud.vm.VirtualMachineManagerImpl.orchestrateAddVmToNetwork(VirtualMachineManagerImpl.java:3409)
> at
> com.cloud.vm.VirtualMachineManagerImpl.addVmToNetwork(VirtualMachineManagerImpl.java:3355)
> at
> com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl.addVpcRouterToGuestNetwork(VpcVirtualNetworkApplianceManagerImpl.java:267)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
> at
> org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
> at
> org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
> at
> org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
> at com.sun.proxy.$Proxy191.addVpcRouterToGuestNetwork(Unknown Source)
> at
> com.cloud.network.element.VpcVirtualRouterElement.implement(VpcVirtualRouterElement.java:187)
> at
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.implementNetworkElementsAndResources(NetworkOrchestrator.java:1088)
> at
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.implementNetwork(NetworkOrchestrator.java:995)
> at
> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.prepare(NetworkOrchestrator.java:1282)
> at
> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:985)
> at
> com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:5146)
> at sun.reflect.GeneratedMethodAccessor358.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
> at
> com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:5302)
> at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
> at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> at
> org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> 2014-07-28 02:17:16,160 DEBUG [c.c.n.r.VpcVirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Removing the
> router VM[DomainRouter|r-38-VM] from network Ntwk[222|Guest|11] as a part of
> cleanup
> 2014-07-28 02:17:16,161 DEBUG [c.c.n.r.VpcVirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Router
> VM[DomainRouter|r-38-VM] is not a part of the Guest network Ntwk[222|Guest|11]
> 2014-07-28 02:17:16,161 DEBUG [c.c.n.r.VpcVirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Removed the
> router VM[DomainRouter|r-38-VM] from network Ntwk[222|Guest|11] as a part of
> cleanup
> 2014-07-28 02:17:16,161 DEBUG [o.a.c.e.o.NetworkOrchestrator]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Cleaning up
> because we're unable to implement the network Ntwk[222|Guest|11]
> 2014-07-28 02:17:16,169 DEBUG [o.a.c.e.o.NetworkOrchestrator]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Lock is
> acquired for network Ntwk[222|Guest|11] as a part of network shutdown
> 2014-07-28 02:17:16,174 DEBUG [o.a.c.e.o.NetworkOrchestrator]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Releasing 0
> port forwarding rules for network id=222 as a part of shutdownNetworkRules
> 2014-07-28 02:17:16,174 DEBUG [c.c.n.f.FirewallManagerImpl]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) There are no
> rules to forward to the network elements
> 2014-07-28 02:17:16,175 DEBUG [o.a.c.e.o.NetworkOrchestrator]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Releasing 0
> static nat rules for network id=222 as a part of shutdownNetworkRules
> 2014-07-28 02:17:16,175 DEBUG [c.c.n.f.FirewallManagerImpl]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) There are no
> rules to forward to the network elements
> 2014-07-28 02:17:16,175 DEBUG [c.c.n.l.LoadBalancingRulesManagerImpl]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) Revoking 0
> Public load balancing rules for network id=222
> 2014-07-28 02:17:16,176 DEBUG [c.c.n.l.LoadBalancingRulesManagerImpl]
> (Work-Job-Executor-72:ctx-6d23d8df job-215/job-216 ctx-cb3c4a4d) There are no
> Load Balancing Rules to forward to the network elements
--
This message was sent by Atlassian JIRA
(v6.2#6252)