[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822032#comment-13822032
 ] 

Sheng Yang commented on CLOUDSTACK-4540:
----------------------------------------

I found it’s basically inevitable for this issue in theory.

The things is, VR would take time to execute the commands, say it would need 
time t1(which is greater than 0).

And the interval between parallel deployment is t2(which can be almost 0).

In any case, VR need to handle commands in sequence internally, so if t1 > t2, 
then the new task in the VR would wait longer and longer to execute, then some 
commands result in timeout ultimately. No matter how long the timeout is, if 
there are enough big number of queued task for VR, the last ones can timeout.

Currently VR has a robust mechanism to sequence the jobs internal and I 
confirmed in this case, it works well. But there is no way to fix this issue if 
VR is already 100% load at all time.

Probably we can improve the speed of VR internal executing, but seems the 
ultimate answer is: set execute.in.sequence.network.element.commands to true. 
VR doesn’t know how long it would take for mgmt. server to timeout, only mgmt. 
server knows that.

I’ve tested deploying 30 vms, and about exactly last 6~7 failed on Shweta’s 
setup with parallel execution of commands due to timeout(and lot of lock 
pending info in the /var/log/messages, but locks are all cleared after 
execution completed), and no failure if set parallel to false for network 
element commands.

So set execute.in.sequence.network.element.commands to true is an solution.

> Parallel deployment - Vmware - When deploying 30 parallel Vms , 16 Vms fails 
> to get deployed due to "VmDataCommand failed due to Exception: 
> java.lang.Exception Message: Timed out in waiting SSH execution result"
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-4540
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4540
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Management Server
>    Affects Versions: 4.2.0
>         Environment: Build from 4.2-forward.
>            Reporter: Sangeetha Hariharan
>            Assignee: Sheng Yang
>            Priority: Blocker
>             Fix For: 4.3.0
>
>         Attachments: management-server.log
>
>
> Parallel deployment - Vmware - When deploying 30 parallel Vms , 16 Vms fails 
> to get deployed due to "VmDataCommand failed due to Exception: 
> java.lang.Exception
> Message: Timed out in waiting SSH execution result"
> Set up - Advanced zone with 1 Vmware 5.0.0 Esxi host.
> Deploy 30 Vms in parallel.
> 16 out of 30 vms deployed in parallel , failed due to "VmDataCommand failed 
> due to Exception: java.lang.Exception
> Message: Timed out in waiting SSH execution result"
> Following exception seen in Management server logs:
> 2013-08-28 10:26:58,939 ERROR [vmware.resource.VmwareResource] 
> (DirectAgent-21:10.223.58.66) VmDataCommand failed due to Exception: 
> java.lang.Exception
> Message: Timed out in waiting SSH execution result
> java.lang.Exception: Timed out in waiting SSH execution result
>         at com.cloud.utils.ssh.SshHelper.sshExecute(SshHelper.java:166)
>         at com.cloud.utils.ssh.SshHelper.sshExecute(SshHelper.java:37)
>         at 
> com.cloud.hypervisor.vmware.resource.VmwareResource.execute(VmwareResource.java:2470)
>         at 
> com.cloud.hypervisor.vmware.resource.VmwareResource.executeRequest(VmwareResource.java:441)
>         at 
> com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgentAttache.java:186)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)
> 2013-08-28 10:26:58,940 DEBUG [agent.manager.DirectAgentAttache] 
> (DirectAgent-21:null) Seq 1-170983503: Response Received:
> 2013-08-28 10:26:58,941 DEBUG [agent.transport.Request] (DirectAgent-21:null) 
> Seq 1-170983503: Processing:  { Ans: , MgmtId: 7083743249448, via: 1, Ver: 
> v1, Flags: 10, 
> [{"com.cloud.agent.api.Answer":{"result":true,"wait":0}},{"com.cloud.agent.api.Answer":{"result":false,"details":"VmDataCommand
>  failed due to Exception: java.lang.Exception\nMessage: Timed out in waiting 
> SSH execution result\n","wait":0}}] }
> 2013-08-28 10:26:58,941 DEBUG [agent.transport.Request] 
> (Job-Executor-29:job-398 = [ b3a34f25-37b2-4f33-b183-c0ea348d7af9 ]) Seq 
> 1-170983503: Received:  { Ans: , MgmtId: 7083743249448, via: 1, Ver: v1, 
> Flags: 10, { Answer, Answer } }
> 2013-08-28 10:26:58,979 INFO  [cloud.vm.VirtualMachineManagerImpl] 
> (Job-Executor-29:job-398 = [ b3a34f25-37b2-4f33-b183-c0ea348d7af9 ]) Unable 
> to contact resource.
> com.cloud.exception.ResourceUnavailableException: Resource [DataCenter:1] is 
> unreachable: Unable to apply userdata and password entry on router
>         at 
> com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyRules(VirtualNetworkApplianceManagerImpl.java:3808)
>         at 
> com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyUserData(VirtualNetworkApplianceManagerImpl.java:2993)
>         at 
> com.cloud.network.element.VirtualRouterElement.addPasswordAndUserdata(VirtualRouterElement.java:926)
>         at 
> com.cloud.network.NetworkManagerImpl.prepareElement(NetworkManagerImpl.java:2076)
>         at 
> com.cloud.network.NetworkManagerImpl.prepareNic(NetworkManagerImpl.java:2191)
>         at 
> com.cloud.network.NetworkManagerImpl.prepare(NetworkManagerImpl.java:2127)
>         at 
> com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:886)
>         at 
> com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:578)
>         at 
> org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:227)
>         at 
> org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:209)
>         at 
> com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:3406)
>         at 
> com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2966)
>         at 
> com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2952)
>         at 
> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
>         at 
> org.apache.cloudstack.api.command.user.vm.DeployVMCmd.execute(DeployVMCmd.java:420)
>         at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:158)
>         at 
> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:531)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>         at java.lang.Thread.run(Thread.java:679)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to