Edison, Please review the patch: https://reviews.apache.org/r/13223/
-Wei 2013/8/2 Wei ZHOU <[email protected]> > Alex, > > Exactly. > > We can also use Enable Maintenance -> umount nfs point -> restart > cloudstack-agent -> Cancel Maintenance to solve this issue. > > -Wei > > > 2013/8/2 Alex Huang <[email protected]> > >> So I have very limited knowledge on KVM. But, from my understanding from >> Edison, we should consider what has to be done to fix this problem once it >> occurs. >> >> - Shutdown all VMs on all hosts that are affected. >> - umount the nfs mount point >> - Reestablish the storage pool. >> - Restart the VMs. >> >> Given how severe these actions are to the end user, I would vote for the >> file lock to ensure it never happens, even if it's slower. >> >> --Alex >> >> > -----Original Message----- >> > From: Wei ZHOU [mailto:[email protected]] >> > Sent: Tuesday, July 16, 2013 3:35 AM >> > To: [email protected] >> > Subject: Re: How to fix libvirt storage pool refresh issue? >> > >> > I agree with Wido. >> > >> > Moreover, the file lock will cause performane degrade of VM deployment. >> > >> > -Wei >> > >> > >> > 2013/7/16 Wido den Hollander <[email protected]> >> > >> > > On 07/16/2013 12:27 AM, Marcus Sorensen wrote: >> > > >> > >> I'm ok with a symptom fix on our end, if the root cause is in >> > >> Libvirt we can't do much about that. This is the sort of patch that >> > >> tends to get pulled into the regular update cycle of the >> > >> distributions, so unless there's more to it and it's not a good fix I >> > >> imagine we will see it come through without having to wait for the >> > >> next point releases. We still have to support existing users who >> > >> might not be running the latest, though, so the symptom fix is >> > >> probably ok as a temporary measure. >> > >> >> > > >> > > I'm ok with not calling storagePoolRefresh every time we want a >> > > capacity update, since that's also kind of I/O intensive for larger >> storage >> > arrays. >> > > >> > > However, we should make sure we have a GOOD comment in the code >> > about >> > > this "fix", since that's the reason I initially removed the old code >> > > which invoked "df". >> > > >> > > I'll see if I can get this libvirt patch into Ubuntu when it hits >> > > libvirt upstream, since this bug is really annoying. >> > > >> > > Wido >> > > >> > > >> > > >> > >> On Mon, Jul 15, 2013 at 3:42 PM, Edison Su <[email protected]> >> wrote: >> > >> >> > >>> There is a serious issue on KVM(https://issues.apache.org/** >> > >>> jira/browse/CLOUDSTACK- >> > 2729<https://issues.apache.org/jira/browse/CLOUDSTACK-2729>): >> > >>> a libvirt storage pool can disappear on KVM host, it's easy to be >> > >>> reproduced in our internal QA environment. >> > >>> Wei found the root cause, is on the libvirt: >> > >>> " >> > >>> This is a libvirt issue. I created a ticket for it. >> > >>> https://bugzilla.redhat.com/**show_bug.cgi?id=977706< >> https://bugzill >> > >>> a.redhat.com/show_bug.cgi?id=977706> >> > >>> The patch is very simple. >> > >>> >> https://www.redhat.com/**archives/libvir-list/2013-**July/msg00635.h >> > >>> tml< >> https://www.redhat.com/archives/libvir-list/2013-July/msg00635.h >> > >>> tml> >> > >>> " >> > >>> But it's also introduced by CloudStack, as cloudstack will call >> > >>> libvirt storage pool refresh method each time when access the >> > >>> storage pool. The code is added by commit: >> > >>> 2ffc9907f7b0d371737e39b7649f7a**f23026f5cf, >> > >>> about less than one year ago. >> > >>> >> > >>> As Wei suggested, we can call storage pool refresh only if needed, >> > >>> it will mitigate the issue(It's behavior I did on cloudstack >> > >>> pre-4.0), but it's only treat the symptom, not the cause. >> > >>> Or add a cluster wide lock, only one guy can access storage pool at >> > >>> one time, we can add a file lock on NFS primary storage. >> > >>> Any idea/feedback on how to fix this KVM issue? >> > >>> >> > >>> >> > >>> >> > >>> >> > > >> > >
