[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708299#comment-13708299
 ] 

Wei Zhou commented on CLOUDSTACK-2729:
--------------------------------------

Rayees / Edison,

This is a libvirt issue. I created a ticket for it.
 https://bugzilla.redhat.com/show_bug.cgi?id=977706
The patch is very simple.
https://www.redhat.com/archives/libvir-list/2013-July/msg00635.html

Considering about this case: We use NFS server as the source of storage pool. 
When we expunged multiple VMs. Management server will distribute the expunge 
tasks to multiple nodes. Each thread on nodes will refresh the pool and delete 
volume, at almost the same time , so that the error will occur with great 
probability.

However, we also need to deal with it in CloudStack side, as most of clients do 
not use the latest libvirt.

Two ways to reduce the occuring count .
(1) refresh pool only if needed.
(2) add a file lock to avoid refreshing storage pools and deleting volumes at 
same time.

I have applied (2) on my platform. It can reduce the occuring count of the 
issue, but can not avoid the issue in stress testing,
                
> [Automation] Libvirt failed find primary  storage and VM deployment failed  
> ----------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-2729
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-2729
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Storage Controller
>    Affects Versions: 4.2.0
>         Environment: Master branch build
> Automation environment - KVM  
>            Reporter: Rayees Namathponnan
>            Priority: Blocker
>             Fix For: 4.2.0
>
>         Attachments: CLOUDSTACK-2729.rar
>
>
> Failed to deploy VM in automation, environment created with , 2 hosts and 2 
> primary storages in a cluster 
> 1) /export/home/rayees/SC_QA_AUTO4/primary2
> 2) /export/home/rayees/SC_QA_AUTO4/primary
> Libvirt failed to find the primary storage 
> /export/home/rayees/SC_QA_AUTO4/primary and VM deployment failed 
> MS log
> 2013-05-28 21:11:44,540 DEBUG [agent.transport.Request] (consoleproxy-1:null) 
> Seq 4-936706756: Sending  { Cmd , MgmtId: 29066118877352, via: 4, Ver: v1, 
> Flags: 100111, 
> [{"StartCommand":{"vm":{"id":29,"name":"v-29-QA","type":"ConsoleProxy","cpus":1,"minSpeed":500,"maxSpeed":500,"minRam":1073741824,"maxRam":1073741824,"arch":"x86_64","os":"Debian
>  GNU/Linux 5.0 (32-bit)","bootArgs":" template=domP type=consoleproxy 
> host=10.223.49.195 port=8250 name=v-29-QA premium=true zone=1 pod=1 
> guid=Proxy.29 proxy_vm=29 disable_rp_filter=true eth2ip=10.223.122.73 
> eth2mask=255.255.255.192 gateway=10.223.122.65 eth0ip=169.254.0.154 
> eth0mask=255.255.0.0 eth1ip=10.223.50.96 eth1mask=255.255.255.192 
> mgmtcidr=10.223.49.192/26 localgw=10.223.50.65 internaldns1=10.223.110.254 
> dns1=72.52.126.11","rebootOnCrash":false,"enableHA":false,"limitCpuUse":false,"vncPassword":"d099568827911cef","params":{},"uuid":"5a146833-6a8c-44e5-83c0-50f34accf513","disks":[{"id":32,"name":"ROOT-29","mountPoint":"/export/home/rayees/SC_QA_AUTO4/primary","path":"f6f8d865-e9c0-4188-8a33-6c6383ca5075","size":276406784,"type":"ROOT","storagePoolType":"NetworkFilesystem","storagePoolUuid":"fff90cb5-06dd-33b3-8815-d78c08ca01d9","deviceId":0}],"nics":[{"deviceId":2,"networkRateMbps":-1,"defaultNic":true,"uuid":"21175978-96bd-4160-8228-8ad15aa40c66","ip":"10.223.122.73","netmask":"255.255.255.192","gateway":"10.223.122.65","mac":"06:2e:5a:00:00:42","dns1":"72.52.126.11","broadcastType":"Vlan","type":"Public","broadcastUri":"vlan://1221","isolationUri":"vlan://1221","isSecurityGroupEnabled":false},{"deviceId":0,"networkRateMbps":-1,"defaultNic":false,"uuid":"e0651452-a76e-4564-96b2-3d5d51e9bcd6","ip":"169.254.0.154","netmask":"255.255.0.0","gateway":"169.254.0.1","mac":"0e:00:a9:fe:00:9a","broadcastType":"LinkLocal","type":"Control","isSecurityGroupEnabled":false},{"deviceId":1,"networkRateMbps":-1,"defaultNic":false,"uuid":"0765c229-468e-4dfd-8382-24ac49791a8d","ip":"10.223.50.96","netmask":"255.255.255.192","gateway":"10.223.50.65","mac":"06:a8:e8:00:00:1d","broadcastType":"Native","type":"Management","isSecurityGroupEnabled":false}]},"hostIp":"10.223.50.66","wait":0}},{"check.CheckSshCommand":{"ip":"169.254.0.154","port":3922,"interval":6,"retries":100,"name":"v-29-QA","wait":0}}]
>  }
> 2013-05-28 21:11:44,552 DEBUG [agent.transport.Request] 
> (AgentManager-Handler-1:null) Seq 4-936706756: Processing:  { Ans: , MgmtId: 
> 29066118877352, via: 4, Ver: v1, Flags: 110, 
> [{"Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
>  org.libvirt.LibvirtException: Storage pool not found: no pool with matching 
> uuid\n\tat 
> com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.getStoragePool(LibvirtStorageAdaptor.java:380)\n\tat
>  
> com.cloud.hypervisor.kvm.storage.KVMStoragePoolManager.getStoragePool(KVMStoragePoolManager.java:72)\n\tat
>  
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.createVbd(LibvirtComputingResource.java:3399)\n\tat
>  
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:3293)\n\tat
>  
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1187)\n\tat
>  com.cloud.agent.Agent.processRequest(Agent.java:525)\n\tat 
> com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:852)\n\tat 
> com.cloud.utils.nio.Task.run(Task.java:83)\n\tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)\n\tat
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)\n\tat
>  
> java.lang.Thread.run(Thread.java:679)\n","wait":0}},{"Answer":{"result":false,"details":"Stopped
>  by previous failure","wait":0}}] }
> 2013-05-28 21:11:44,552 DEBUG [agent.manager.AgentAttache] 
> (AgentManager-Handler-1:null) Seq 4-936706756: No more commands found
> 2013-05-28 21:11:44,552 DEBUG [agent.transport.Request] (consoleproxy-1:null) 
> Seq 4-936706756: Received:  { Ans: , MgmtId: 29066118877352, via: 4, Ver: v1, 
> Flags: 110, { Answer, Answer } }
> 2013-05-28 21:11:44,559 ERROR [cloud.vm.VirtualMachineManagerImpl] 
> (consoleproxy-1:null) Failed to start instance VM[ConsoleProxy|v-29-QA]
> com.cloud.utils.exception.CloudRuntimeException: Unable to get answer that is 
> of class com.cloud.agent.api.StartAnswer
>         at com.cloud.agent.manager.Commands.getAnswer(Commands.java:80)
>         at 
> com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:873)
>         at 
> com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:550)
>         at 
> com.cloud.vm.VirtualMachineManagerImpl.start(VirtualMachineManagerImpl.java:543)
>         at 
> com.cloud.consoleproxy.ConsoleProxyManagerImpl.startProxy(ConsoleProxyManagerImpl.java:561)
>         at 
> com.cloud.consoleproxy.ConsoleProxyManagerImpl.allocCapacity(ConsoleProxyManagerImpl.java:923)
>         at 
> com.cloud.consoleproxy.ConsoleProxyManagerImpl.expandPool(ConsoleProxyManagerImpl.java:1667)
>         at 
> com.cloud.consoleproxy.ConsoleProxyManagerImpl.expandPool(ConsoleProxyManagerImpl.java:157)
>         at 
> com.cloud.vm.SystemVmLoadScanner.loadScan(SystemVmLoadScanner.java:111)
> Agent log
> 2013-05-29 01:06:07,911 DEBUG [cloud.agent.Agent] 
> (agentRequest-Handler-4:null) Request:Seq 4-936706739:  { Cmd , MgmtId: 
> 29066118877352, via: 4, Ver: v1, Flags: 100111, 
> [{"StartCommand":{"vm":{"id":29,"name":"v-29-QA","type":"ConsoleProxy","cpus":1,"minSpeed":500,"maxSpeed":500,"minRam":1073741824,"maxRam":1073741824,"arch":"x86_64","os":"Debian
>  GNU/Linux 5.0 (32-bit)","bootArgs":" template=domP type=consoleproxy 
> host=10.223.49.195 port=8250 name=v-29-QA premium=true zone=1 pod=1 
> guid=Proxy.29 proxy_vm=29 disable_rp_filter=true eth2ip=10.223.122.73 
> eth2mask=255.255.255.192 gateway=10.223.122.65 eth0ip=169.254.1.243 
> eth0mask=255.255.0.0 eth1ip=10.223.50.93 eth1mask=255.255.255.192 
> mgmtcidr=10.223.49.192/26 localgw=10.223.50.65 internaldns1=10.223.110.254 
> dns1=72.52.126.11","rebootOnCrash":false,"enableHA":false,"limitCpuUse":false,"vncPassword":"d099568827911cef","params":{},"uuid":"5a146833-6a8c-44e5-83c0-50f34accf513","disks":[{"id":32,"name":"ROOT-29","mountPoint":"/export/home/rayees/SC_QA_AUTO4/primary","path":"f6f8d865-e9c0-4188-8a33-6c6383ca5075","size":276406784,"type":"ROOT","storagePoolType":"NetworkFilesystem","storagePoolUuid":"fff90cb5-06dd-33b3-8815-d78c08ca01d9","deviceId":0}],"nics":[{"deviceId":2,"networkRateMbps":-1,"defaultNic":true,"uuid":"21175978-96bd-4160-8228-8ad15aa40c66","ip":"10.223.122.73","netmask":"255.255.255.192","gateway":"10.223.122.65","mac":"06:2e:5a:00:00:42","dns1":"72.52.126.11","broadcastType":"Vlan","type":"Public","broadcastUri":"vlan://1221","isolationUri":"vlan://1221","isSecurityGroupEnabled":false},{"deviceId":0,"networkRateMbps":-1,"defaultNic":false,"uuid":"e0651452-a76e-4564-96b2-3d5d51e9bcd6","ip":"169.254.1.243","netmask":"255.255.0.0","gateway":"169.254.0.1","mac":"0e:00:a9:fe:01:f3","broadcastType":"LinkLocal","type":"Control","isSecurityGroupEnabled":false},{"deviceId":1,"networkRateMbps":-1,"defaultNic":false,"uuid":"0765c229-468e-4dfd-8382-24ac49791a8d","ip":"10.223.50.93","netmask":"255.255.255.192","gateway":"10.223.50.65","mac":"06:49:6a:00:00:1a","broadcastType":"Native","type":"Management","isSecurityGroupEnabled":false}]},"hostIp":"10.223.50.66","wait":0}},{"check.CheckSshCommand":{"ip":"169.254.1.243","port":3922,"interval":6,"retries":100,"name":"v-29-QA","wait":0}}]
>  }
> 2013-05-29 01:06:07,912 DEBUG [cloud.agent.Agent] 
> (agentRequest-Handler-4:null) Processing command: 
> com.cloud.agent.api.StartCommand
> 2013-05-29 01:06:07,923 WARN  [cloud.agent.Agent] 
> (agentRequest-Handler-4:null) Caught:
> com.cloud.utils.exception.CloudRuntimeException: 
> org.libvirt.LibvirtException: Storage pool not found: no pool with matching 
> uuid
>         at 
> com.cloud.hypervisor.kvm.storage.LibvirtStorageAdaptor.getStoragePool(LibvirtStorageAdaptor.java:380)
>         at 
> com.cloud.hypervisor.kvm.storage.KVMStoragePoolManager.getStoragePool(KVMStoragePoolManager.java:72)
>         at 
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.createVbd(LibvirtComputingResource.java:3399)
>         at 
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.execute(LibvirtComputingResource.java:3293)
>         at 
> com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1187)
>         at com.cloud.agent.Agent.processRequest(Agent.java:525)
>         at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:852)
>         at com.cloud.utils.nio.Task.run(Task.java:83)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > create VM issuing create command with primary storage 
> > “/export/home/rayees/SC_QA_AUTO4/primary” but its not available in pool-list
> [root@Rack2Host11 ~]# virsh pool-list --all
> Name                 State      Autostart
> -----------------------------------------
> 1c9fcb9a-aece-453b-8510-4de4b4dbffdc active     no
> 41b632b5-40b3-3024-a38b-ea259c72579f active     no
> [root@Rack2Host11 ~]#
> [root@Rack2Host11 ~]#
> [root@Rack2Host11 ~]# virsh pool-dumpxml 1c9fcb9a-aece-453b-8510-4de4b4dbffdc
> <pool type='dir'>
>   <name>1c9fcb9a-aece-453b-8510-4de4b4dbffdc</name>
>   <uuid>1c9fcb9a-aece-453b-8510-4de4b4dbffdc</uuid>
>   <capacity unit='bytes'>237777190912</capacity>
>   <allocation unit='bytes'>2098745344</allocation>
>   <available unit='bytes'>235678445568</available>
>   <source>
>   </source>
>   <target>
>     <path>/var/lib/libvirt/images</path>
>     <permissions>
>       <mode>0700</mode>
>       <owner>-1</owner>
>       <group>-1</group>
>     </permissions>
>   </target>
> </pool>
> [root@Rack2Host11 ~]# virsh pool-dumpxml 41b632b5-40b3-3024-a38b-ea259c72579f
> <pool type='netfs'>
>   <name>41b632b5-40b3-3024-a38b-ea259c72579f</name>
>   <uuid>41b632b5-40b3-3024-a38b-ea259c72579f</uuid>
>   <capacity unit='bytes'>11810778316800</capacity>
>   <allocation unit='bytes'>9045745336320</allocation>
>   <available unit='bytes'>2765032980480</available>
>   <source>
>     <host name='10.223.110.232'/>
>     <dir path='/export/home/rayees/SC_QA_AUTO4/primary2'/>
>     <format type='auto'/>
>   </source>
>   <target>
>     <path>/mnt/41b632b5-40b3-3024-a38b-ea259c72579f</path>
>     <permissions>
>       <mode>0700</mode>
>       <owner>-1</owner>
>       <group>-1</group>
>     </permissions>
>   </target>
> </pool>

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to