Thx both, thx Wei - that sounds all interesting. as for "vm migration fails and no retry in cloudstack" - this should NOT trigger stopping the VM - at least what I saw so far - simply host will be in ErrorMaintenance - can you confirm VMs are not stopped in this case?
On Fri, 22 Nov 2019 at 08:54, Wei ZHOU <[email protected]> wrote: > Hi Andrija, > > We have faces some vm migration issues. There are three categories actually > 1. vm migration fails due to different hardware or software on source and > destination hosts, for example, cpu models. vm will be still running on > source hosts. > you may find some errors in agent.log. > 2. vm migration fails due to some libvirt/qemu bugs. you may find some > errors in /var/log/libvirt/qemu/ folder (on ubuntu) on the source or > destination host. > mostly the vm will be still running on source host. In rare cases the vm is > stopped. > 3. vm is stopped due to some cloudstack bugs. for example, when we put a > host to maintenance, the vm will be stopped if (1) no other host is Up in > same cluster, or (2) vm migration fails and no retry in cloudstack, or (3) > multiple vms are migrated to same destination at the same time but there is > no enough memory on the destination. > > We need to fix the issues mentioned in part 3 above in cloudstack. > > In Leaseweb, to improve the vm migration > (1) we use custom cpu model , see > > http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/master/hypervisor/kvm.html#configure-cpu-model-for-kvm-guest-optional > (2) we have build our own qemu packages with some bug fixes for > installation > (3) we have some fixes in our fork from 4.7.1. We have not tested with > 4.13/4.14. > We still see failed vm migration sometimes. However the vms will not be > stopped if migration fails. > > -Wei > > On Fri, 22 Nov 2019 at 01:54, Andrija Panic <[email protected]> > wrote: > > > ( @Sven, not being able to migrate Vm with ISO attached - don't recall > > testing/doing that recently - but is technically perfectly possible, > unless > > we don't support it via CloudStack - feel free to open GitHub issue with > > correct steps to reproduce etc) > > > > On Fri, 22 Nov 2019 at 01:47, Andrija Panic <[email protected]> > > wrote: > > > > > That sucks...thx both. > > > > > > @both - which ACS version do you use (and encounter such issues?) > > > > > > Ubuntu comes with a whole another set of issues (I was losing my nerves > > > around very idiotic things, last time a week ago...) - though most can > be > > > managed with some workarounds. > > > But yes, Qemu/libvirt should be better with Ubuntu - free of RedHat > > > s$^%tty business politics - i.e. in CentOS 6.x you were able to live > > > migrate VM WITH all the volumes to another host/storage. On CentOS 7 > you > > > can't do that any more, unless you are using qemu-kvm-ev (but not the > > > regular one from the SIG CentOS repo, you need the one from the oVirt > > > project) > > > > > > I'm just trying to understand if this is happening also on i.e. ACS > 4.11 > > - > > > so to stop digging around the problem (and assume it's purely CentOS > > which > > > is broken - why all great things need to come to an end...damn it) > > > > > > (well I could also test same ACS code on Ubuntu and see if no issues > > there > > > with live migrations..) > > > > > > Thanks > > > Andrija > > > > > > On Thu, 21 Nov 2019 at 23:39, Jean-Francois Nadeau < > > [email protected]> > > > wrote: > > > > > >> Hi Andrija, > > >> > > >> We experienced that problem with stock packages on CentOS 7.4. Live > > >> migration would frequently fail and leave the VM dead. We since > moved > > >> to > > >> RHEV packages for qemu. Libvirt is still stock per CentoS 7.6 (4.5). > > I > > >> want to say the situation improved but I can't tell yet if we have a > > 100% > > >> success rate on live migrations (as it should be !) > > >> > > >> Redhat also have been messing up severely with stock libvirt versions > > >> between 7.4/7.5/7.6 in such way it broke live migration compatibility > > (cpu > > >> definitions). Im at the crossroads right now to entirely ditch > > >> centos/redhat in favor of Ubuntu to have well tested stock packages. > > >> > > >> best, > > >> > > >> -Jfn > > >> > > >> > > >> > > >> On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic < > [email protected]> > > >> wrote: > > >> > > >> > Hi guys. > > >> > > > >> > I wanted to see if any of you have seen similar/same in master, as > > >> below. > > >> > > > >> > I've been testing some work/PRs (against the current master) and > I've > > >> seen > > >> > that VMs will crash/be stopped occasionally when live migration is > > >> > happening. I experienced this on an NEW/EMPTY env, with 2 KVM hosts, > > and > > >> > only SSVM and CPVM - not a capacity issues or similar. > > >> > > > >> > This is happening with CentOS 7 (CentOS 7.3 I believe, but we also > > >> updated > > >> > packages to the latest stock ones and same issue was happening > again). > > >> > > > >> > This is still under investigation, but I was wondering if anyone > else > > >> has > > >> > seen similar thing happening? > > >> > > > >> > Best, > > >> > > > >> > -- > > >> > > > >> > Andrija Panić > > >> > > > >> > > > > > > > > > -- > > > > > > Andrija Panić > > > > > > > > > -- > > > > Andrija Panić > > > -- Andrija Panić
