[
https://issues.apache.org/jira/browse/CLOUDSTACK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742476#comment-13742476
]
Koushik Das commented on CLOUDSTACK-3441:
-----------------------------------------
In the latest logs, the delay is in acquiring the lock on the network. But this
time the reason is different.
The earlier issue was caused due to CLOUDSTACK-70. The logs also suggested the
same
2013-06-10 15:55:31,807 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl]
(Job-Executor-1004:job-3014) Skipping VR deployment: Found a running or
starting VR in Pod null id=1
2013-06-10 15:55:31,810 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl]
(Job-Executor-1004:job-3014) Skipping VR deployment: Found a running or
starting VR in Pod null id=2
2013-06-10 15:55:31,812 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl]
(Job-Executor-1004:job-3014) Skipping VR deployment: Found a running or
starting VR in Pod null id=3
In the latest logs the delay is still in acquiring the logs but the reason is
something else as the above logs are no longer seen.
In the earlier log the lock was acquired for the network only twice during
deploy VM. But in the latest one I see that the lock is getting acquired 3
times. While debugging the deployVM code I had seen that some of the planner
methods (related to dedicated resources) getting called twice. I suspect that
the reason for the network lock getting acquired one extra time may be the
same. There may be some new change introduced due to which the same code path
gets executed twice.
Also due to some of the new features (dedicated resources etc.) additional
checks have be placed to ignore dedicated resources from being considered for
deployment. Due to this the overall time of deploy VM operation is bound to
increase. The will have an impact on the lock as at any given time there will
be more deploy VM threads active.
This locking logic existed even prior to 4.x but now due to the perf.
degradation need to se if the logic can be optimized.
> [Load Test] High delays between VM being allocated to Pod and network
> implementation causing delays in VM deployment
> --------------------------------------------------------------------------------------------------------------------
>
> Key: CLOUDSTACK-3441
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3441
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: Management Server, Network Controller
> Affects Versions: 4.2.0
> Environment: Load test, with simulator setup
> Reporter: Sowmya Krishnan
> Assignee: Koushik Das
> Priority: Blocker
> Fix For: 4.2.0
>
> Attachments: job1792.log, job-3014_5mins
>
>
> Scale Test set up with 20K simulated hosts, 20K VMs, basic zone, 3
> Management servers
> Also using security groups, ~5K VMs are deployed in every Security group.
> While deployment of simulator VMs, following is observed after around 4K VMs:
> There's delay between VM being allocated to Pod and network implementation
> causing delays in VM deployment. Following are the logs:
> 2013-06-10 15:51:01,215 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (Job-Executor-1004:job-3014) VM is being created in podId: 1390
> 2013-06-10 15:55:31,494 DEBUG [cloud.network.NetworkManagerImpl]
> (Job-Executor-1004:job-3014) Lock is acquired for network id 204 as a part of
> network implement
> This causes delay in VM deployment and the async job to complete. The delays
> get higher and higher as more and more VMs are deployed causing an
> unacceptable delay for huge deployments, viz, > 15-20K VMs
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira