[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742476#comment-13742476
 ] 

Koushik Das commented on CLOUDSTACK-3441:
-----------------------------------------

In the latest logs, the delay is in acquiring the lock on the network. But this 
time the reason is different.

The earlier issue was caused due to CLOUDSTACK-70. The logs also suggested the 
same

2013-06-10 15:55:31,807 DEBUG 
[network.router.VirtualNetworkApplianceManagerImpl] 
(Job-Executor-1004:job-3014) Skipping VR deployment: Found a running or 
starting VR in Pod null id=1
2013-06-10 15:55:31,810 DEBUG 
[network.router.VirtualNetworkApplianceManagerImpl] 
(Job-Executor-1004:job-3014) Skipping VR deployment: Found a running or 
starting VR in Pod null id=2
2013-06-10 15:55:31,812 DEBUG 
[network.router.VirtualNetworkApplianceManagerImpl] 
(Job-Executor-1004:job-3014) Skipping VR deployment: Found a running or 
starting VR in Pod null id=3

In the latest logs the delay is still in acquiring the logs but the reason is 
something else as the above logs are no longer seen.

In the earlier log the lock was acquired for the network only twice during 
deploy VM. But in the latest one I see that the lock is getting acquired 3 
times. While debugging the deployVM code I had seen that some of the planner 
methods (related to dedicated resources) getting called twice. I suspect that 
the reason for the network lock getting acquired one extra time may be the 
same. There may be some new change introduced due to which the same code path 
gets executed twice.

Also due to some of the new features (dedicated resources etc.) additional 
checks have be placed to ignore dedicated resources from being considered for 
deployment. Due to this the overall time of deploy VM operation is bound to 
increase. The will have an impact on the lock as at any given time there will 
be more deploy VM threads active.

This locking logic existed even prior to 4.x but now due to the perf. 
degradation need to se if the logic can be optimized.
                
> [Load Test] High delays between VM being allocated to Pod and network 
> implementation causing delays in VM deployment
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-3441
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3441
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Management Server, Network Controller
>    Affects Versions: 4.2.0
>         Environment: Load test, with simulator setup
>            Reporter: Sowmya Krishnan
>            Assignee: Koushik Das
>            Priority: Blocker
>             Fix For: 4.2.0
>
>         Attachments: job1792.log, job-3014_5mins
>
>
> Scale Test  set up with 20K simulated hosts, 20K VMs, basic zone, 3 
> Management servers 
> Also using security groups, ~5K VMs are deployed in every Security group.
> While deployment of simulator VMs, following is observed after around 4K VMs:
> There's delay between VM being allocated to Pod and network implementation 
> causing delays in VM deployment. Following are the logs: 
> 2013-06-10 15:51:01,215 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
> (Job-Executor-1004:job-3014) VM is being created in podId: 1390 
> 2013-06-10 15:55:31,494 DEBUG [cloud.network.NetworkManagerImpl] 
> (Job-Executor-1004:job-3014) Lock is acquired for network id 204 as a part of 
> network implement 
> This causes delay in VM deployment and the async job to complete. The delays 
> get higher and higher as more and more VMs are deployed causing an 
> unacceptable delay for huge deployments, viz, > 15-20K  VMs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to