Re: [Qemu-devel] live migration vs device assignment (motivation)

Michael S. Tsirkin Fri, 25 Dec 2015 04:12:45 -0800

On Fri, Dec 25, 2015 at 03:03:47PM +0800, Lan Tianyu wrote:
> Merry Christmas.
> Sorry for later response due to personal affair.
> 
> On 2015年12月14日 03:30, Alexander Duyck wrote:
> >> > These sounds we need to add a faked bridge for migration and adding a
> >> > driver in the guest for it. It also needs to extend PCI bus/hotplug
> >> > driver to do pause/resume other devices, right?
> >> >
> >> > My concern is still that whether we can change PCI bus/hotplug like that
> >> > without spec change.
> >> >
> >> > IRQ should be general for any devices and we may extend it for
> >> > migration. Device driver also can make decision to support migration
> >> > or not.
> > The device should have no say in the matter.  Either we are going to
> > migrate or we will not.  This is why I have suggested my approach as
> > it allows for the least amount of driver intrusion while providing the
> > maximum number of ways to still perform migration even if the device
> > doesn't support it.
> 
> Even if the device driver doesn't support migration, you still want to
> migrate VM? That maybe risk and we should add the "bad path" for the
> driver at least.
> 
> > 
> > The solution I have proposed is simple:
> > 
> > 1.  Extend swiotlb to allow for a page dirtying functionality.
> > 
> >      This part is pretty straight forward.  I'll submit a few patches
> > later today as RFC that can provided the minimal functionality needed
> > for this.
> 
> Very appreciate to do that.
> 
> > 
> > 2.  Provide a vendor specific configuration space option on the QEMU
> > implementation of a PCI bridge to act as a bridge between direct
> > assigned devices and the host bridge.
> > 
> >      My thought was to add some vendor specific block that includes a
> > capabilities, status, and control register so you could go through and
> > synchronize things like the DMA page dirtying feature.  The bridge
> > itself could manage the migration capable bit inside QEMU for all
> > devices assigned to it.  So if you added a VF to the bridge it would
> > flag that you can support migration in QEMU, while the bridge would
> > indicate you cannot until the DMA page dirtying control bit is set by
> > the guest.
> > 
> >      We could also go through and optimize the DMA page dirtying after
> > this is added so that we can narrow down the scope of use, and as a
> > result improve the performance for other devices that don't need to
> > support migration.  It would then be a matter of adding an interrupt
> > in the device to handle an event such as the DMA page dirtying status
> > bit being set in the config space status register, while the bit is
> > not set in the control register.  If it doesn't get set then we would
> > have to evict the devices before the warm-up phase of the migration,
> > otherwise we can defer it until the end of the warm-up phase.
> > 
> > 3.  Extend existing shpc driver to support the optional "pause"
> > functionality as called out in section 4.1.2 of the Revision 1.1 PCI
> > hot-plug specification.
> 
> Since your solution has added a faked PCI bridge. Why not notify the
> bridge directly during migration via irq and call device driver's
> callback in the new bridge driver?
> 
> Otherwise, the new bridge driver also can check whether the device
> driver provides migration callback or not and call them to improve the
> passthough device's performance during migration.


As long as you keep up this vague talk about performance during
migration, without even bothering with any measurements, this patchset
will keep going nowhere.




There's Alex's patch that tracks memory changes during migration.  It
needs some simple enhancements to be useful in production (e.g. add a
host/guest handshake to both enable tracking in guest and to detect the
support in host), then it can allow starting migration with an assigned
device, by invoking hot-unplug after most of memory have been migrated.

Please implement this in qemu and measure the speed.
I will not be surprised if destroying/creating netdev in linux
turns out to take too long, but before anyone bothered
checking, it does not make sense to discuss further enhancements.



> > 
> >      Note I call out "extend" here instead of saying to add this.
> > Basically what we should do is provide a means of quiescing the device
> > without unloading the driver.  This is called out as something the OS
> > vendor can optionally implement in the PCI hot-plug specification.  On
> > OSes that wouldn't support this it would just be treated as a standard
> > hot-plug event.   We could add a capability, status, and control bit
> > in the vendor specific configuration block for this as well and if we
> > set the status bit would indicate the host wants to pause instead of
> > remove and the control bit would indicate the guest supports "pause"
> > in the OS.  We then could optionally disable guest migration while the
> > VF is present and pause is not supported.
> > 
> >      To support this we would need to add a timer and if a new device
> > is not inserted in some period of time (60 seconds for example), or if
> > a different device is inserted,
> > we need to unload the original driver
> > from the device.  In addition we would need to verify if drivers can
> > call the remove function after having called suspend without resume.
> > If not, we could look at adding a recovery function to remove the
> > driver from the device in the case of a suspend with either a failed
> > resume or no resume call.  Once again it would probably be useful to
> > have for those cases where power management suspend/resume runs into
> > an issue like somebody causing a surprise removal while a device was
> > suspended.
> 
> 
> -- 
> Best regards
> Tianyu Lan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] live migration vs device assignment (motivation)

Reply via email to