Re: Primary store (PS) failure/outage

Wido den Hollander Mon, 16 Sep 2013 04:56:03 -0700

On 09/16/2013 01:21 PM, Koushik Das wrote:

Currently the way Cloudstack deals with PS failure is to reboot all hosts 
associated with the cluster. Selectively cleaning up the affected VMs would 
have been the best option, but since issues were seen with stopping VMs on the 
hypervisors (at least in Xenserver 5.6 [1]) reboot was the next option. The 
down side with this approach is if there are more than one PS in the cluster 
then healthy VMs will unnecessarily get affected due to host reboots.


Recently I tried this scenario using both XS 6.1 and 6.2. On 6.1 I think the 
behaviour is similar to 5.6, if the PS is not available then any operation the 
VM like shutdown would hang (waited for more than 30 mins and the operation was 
still stuck). But on 6.2 looks like these scenarios are handled more 
gracefully. In 6.2 on doing a shutdown the VMs power state was changed to 
'halted' and then it was possible to even destroy the VM. Based on this I think 
that at least for XS 6.2 we can do a selective VM cleanup if the PS is not 
available. For older XS version the existing approach would still be used.

Thoughts/comments?

Also for KVM the same approach is used. Can someone let me know if newer 
versions of KVM can handle primary storage failure in a better way wrt to VM 
operations? In that case for KVM also the behaviour can be changed.

I can't comment on this specifically, but when you are using NFS yourQemu process will go into status "D" and can't be killed.


So that will lead to the only other option: Reboot the host

With NFS though, you can stop the NFS server and bring it back 2 hourslater and with KVM all the VMs will recover within 15 min without anyintervention.


CS shouldn't start doing all kinds of things when PS fails.

Wido

For Vmware since it is an externally managed cluster I don't think this issue 
exists.

Thanks,
Koushik

[1] https://issues.apache.org/jira/browse/CLOUDSTACK-3367
[2] http://comments.gmane.org/gmane.comp.apache.cloudstack.user/4254

Re: Primary store (PS) failure/outage

Reply via email to