[
https://issues.apache.org/jira/browse/CLOUDSTACK-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chip Childers reopened CLOUDSTACK-2568:
---------------------------------------
reopening due to cherry-pick conflicts. let's only resolve when the code is in
the correct branch. Thanks Prachi!
> ACS41 regression in storage subsystem (seen with local storage and 2 or more
> hosts)
> -----------------------------------------------------------------------------------
>
> Key: CLOUDSTACK-2568
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-2568
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: Management Server
> Affects Versions: 4.1.0
> Environment: RHES64 as in OEL64. Install from RPM built from latest
> GIT on OEL64.
> 2 or more KVM hypervisors with local storage in one cluster that has one
> primary NFS storage pool.
> Reporter: Ove Ewerlid
> Assignee: Prachi Damle
> Priority: Blocker
> Fix For: 4.1.0
>
> Attachments: var-log-cloudstack-management.tar.gz
>
>
> ACS402 works with no issues when tested in exactly the setup where ACS41
> fails.
> Identical configuration (the same setup program is used for testing both
> versions).
> In ACS410 startVM fails if and only if the advanceStart: log line picks a
> poolID that is not valid.
> E.g., the poolID reported in this logline appears random across a large
> number of tests.
> If a poolID that can not be reached by the host selected for deployment, the
> startVM fails.
> This is blocking upgrade from 4.0 to 4.1 since there is no reliable way to
> start VMs that have been deployed. If a deployed VM fails to start, giving
> the startVM command multiple times, will eventually make the VM start.
> The more hosts there are, the less likely it is a startVM will succeed. It is
> less likely that the poolID is correct.
> The below log portion conveys how the VM has a "correct" Deployment
> Destination reported and the advanceStart reports a poolID that is different
> and since the selected hypervisor can not reach the poolID the startVM fails.
> The bug never triggers if there is only one KVM with local storage since the
> poolId can not be wrong, there is just one (and the NFS pool is always valid).
> -------------------
> 2013-05-20 06:49:29,477 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Found a potential host id: 1 name:
> vm3-net0-s0-14.test.devops and associated storage pools for this VM
> 2013-05-20 06:49:29,478 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Returning Deployment Destination:
> Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))]
> : Dest[Zone(1)-Pod(1)-Cluster(1)-Host\
> (1)-Storage(Volume(10|ROOT-->Pool(200), Volume(11|DATADISK-->Pool(200))]
> 2013-05-20 06:49:29,495 DEBUG [cloud.capacity.CapacityManagerImpl]
> (Job-Executor-34:job-34) VM state transitted from :Stopped to Starting with
> event: StartRequestedvm's original host id: null new host id: null host id
> before state trans\
> ition: null
> 2013-05-20 06:49:29,495 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (Job-Executor-34:job-34) Successfully transitioned to start state for
> VM[User|testvm-a] reservation id = e644d55e-3627-4395-9f89-639e6fc2f261
> 2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (Job-Executor-34:job-34) Trying to deploy VM, vm has dcId: 1 and podId: null
> 2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (Job-Executor-34:job-34) advanceStart: DeploymentPlan is provided, using
> dcId:1, podId: 1, clusterId: 1, hostId: 1, poolId: 201
> 2013-05-20 06:49:29,502 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> (Job-Executor-34:job-34) Deploy avoids pods: null, clusters: null, hosts: null
> 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) DeploymentPlanner allocation algorithm: random
> 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Trying to allocate a host and storage pools from
> dc:1, pod:1,cluster:1, requested cpu: 4000, requested ram: 2147483648
> 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Is ROOT volume READY (pool already allocated)?: Yes
> 2013-05-20 06:49:29,504 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) DeploymentPlan has host_id specified, making no
> checks on this host, looks like admin test: 1
> 2013-05-20 06:49:29,505 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Looking for suitable pools for this host under zone:
> 1, pod: 1, cluster: 1
> 2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Checking suitable pools for volume (Id, Type):
> (10,ROOT)
> 2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Volume has pool(201) already allocated, checking if
> pool can be reused, poolId: null
> 2013-05-20 06:49:29,506 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) finding pool by id '201'
> 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Planner need not allocate a pool for this volume
> since its READY
> 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Checking suitable pools for volume (Id, Type):
> (11,DATADISK)
> 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Volume has pool(201) already allocated, checking if
> pool can be reused, poolId: null
> 2013-05-20 06:49:29,507 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) finding pool by id '201'
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Planner need not allocate a pool for this volume
> since its READY
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Trying to find a potenial host and associated
> storage pools from the suitable host/pool lists for this VM
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Checking if host: 1 can access any suitable storage
> pool for volume: DATADISK
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Host: 1 cannot access pool: 201
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Could not find a potential host that has associated
> storage pools from the suitable host/pool lists for this VM
> 2013-05-20 06:49:29,508 DEBUG [cloud.deploy.FirstFitPlanner]
> (Job-Executor-34:job-34) Cannnot deploy to specified host, returning.
> 2013-05-20 06:49:29,524 DEBUG [cloud.capacity.CapacityManagerImpl]
> (Job-Executor-34:job-34) VM state transitted from :Starting to Stopped with
> event: OperationFailedvm's original host id: null new host id: null host id
> before state tran\
> sition: null
> 2013-05-20 06:49:29,533 ERROR [cloud.async.AsyncJobManagerImpl]
> (Job-Executor-34:job-34) Unexpected exception while executing
> org.apache.cloudstack.api.command.user.vm.StartVMCmd
> com.cloud.exception.InsufficientServerCapacityException: Unable to create a
> deployment for VM[User|testvm-a]Scope=interface com.cloud.dc.DataCenter; id=1
> at
> com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:728)
> at
> com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:471)
> at
> org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.deployVirtualMachine(VMEntityManagerImpl.java:212)
> at
> org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.deploy(VirtualMachineEntityImpl.java:209)
> at
> com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:3865)
> at
> com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2573)
> at
> com.cloud.utils.component.ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept(ComponentInstantiationPostProcessor.java:125)
> at
> org.apache.cloudstack.api.command.user.vm.StartVMCmd.execute(StartVMCmd.java:120)
> at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:162)
> at
> com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:437)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira