There is a serious issue on KVM(https://issues.apache.org/jira/browse/CLOUDSTACK-2729): a libvirt storage pool can disappear on KVM host, it's easy to be reproduced in our internal QA environment. Wei found the root cause, is on the libvirt: " This is a libvirt issue. I created a ticket for it. https://bugzilla.redhat.com/show_bug.cgi?id=977706 The patch is very simple. https://www.redhat.com/archives/libvir-list/2013-July/msg00635.html " But it's also introduced by CloudStack, as cloudstack will call libvirt storage pool refresh method each time when access the storage pool. The code is added by commit: 2ffc9907f7b0d371737e39b7649f7af23026f5cf, about less than one year ago.
As Wei suggested, we can call storage pool refresh only if needed, it will mitigate the issue(It's behavior I did on cloudstack pre-4.0), but it's only treat the symptom, not the cause. Or add a cluster wide lock, only one guy can access storage pool at one time, we can add a file lock on NFS primary storage. Any idea/feedback on how to fix this KVM issue?