Re: [Qemu-devel] live migration vs device assignment (motivation)

Lan, Tianyu Sun, 13 Dec 2015 07:48:52 -0800


On 12/11/2015 1:16 AM, Alexander Duyck wrote:

On Thu, Dec 10, 2015 at 6:38 AM, Lan, Tianyu <[email protected]> wrote:



On 12/10/2015 7:41 PM, Dr. David Alan Gilbert wrote:


Ideally, it is able to leave guest driver unmodified but it requires the

hypervisor or qemu to aware the device which means we may need a driver
in
hypervisor or qemu to handle the device on behalf of guest driver.


Can you answer the question of when do you use your code -
     at the start of migration or
     just before the end?



Just before stopping VCPU in this version and inject VF mailbox irq to
notify the driver if the irq handler is installed.
Qemu side also will check this via the faked PCI migration capability
and driver will set the status during device open() or resume() callback.


The VF mailbox interrupt is a very bad idea.  Really the device should
be in a reset state on the other side of a migration.  It doesn't make
sense to have the interrupt firing if the device is not configured.
This is one of the things that is preventing you from being able to
migrate the device while the interface is administratively down or the
VF driver is not loaded.


From my opinion, if VF driver is not loaded and hardware doesn't start
to work, the device state doesn't need to be migrated.

We may add a flag for driver to check whether migration happened duringit's down and reinitialize the hardware and clear the flag when systemtry to put it up.

We may add migration core in the Linux kernel and provide some helpsfunctions to facilitate to add migration support for drivers.

Migration core is in charge to sync status with Qemu.

Example.
migration_register()
Driver provides
- Callbacks to be called before and after migration or for bad path
- Its irq which it prefers to deal with migration event.

migration_event_check()
Driver calls it in the irq handler. Migration core code will check
migration status and call its callbacks when migration happens.


My thought on all this is that it might make sense to move this
functionality into a PCI-to-PCI bridge device and make it a
requirement that all direct-assigned devices have to exist behind that
device in order to support migration.  That way you would be working
with a directly emulated device that would likely already be
supporting hot-plug anyway.  Then it would just be a matter of coming
up with a few Qemu specific extensions that you would need to add to
the device itself.  The same approach would likely be portable enough
that you could achieve it with PCIe as well via the same configuration
space being present on the upstream side of a PCIe port or maybe a
PCIe switch of some sort.

It would then be possible to signal via your vendor-specific PCI
capability on that device that all devices behind this bridge require
DMA page dirtying, you could use the configuration in addition to the
interrupt already provided for hot-plug to signal things like when you
are starting migration, and possibly even just extend the shpc
functionality so that if this capability is present you have the
option to pause/resume instead of remove/probe the device in the case
of certain hot-plug events.  The fact is there may be some use for a
pause/resume type approach for PCIe hot-plug in the near future
anyway.  From the sounds of it Apple has required it for all
Thunderbolt device drivers so that they can halt the device in order
to shuffle resources around, perhaps we should look at something
similar for Linux.

The other advantage behind grouping functions on one bridge is things
like reset domains.  The PCI error handling logic will want to be able
to reset any devices that experienced an error in the event of
something such as a surprise removal.  By grouping all of the devices
you could disable/reset/enable them as one logical group in the event
of something such as the "bad path" approach Michael has mentioned.


These sounds we need to add a faked bridge for migration and adding a
driver in the guest for it. It also needs to extend PCI bus/hotplug
driver to do pause/resume other devices, right?

My concern is still that whether we can change PCI bus/hotplug like that
without spec change.

IRQ should be general for any devices and we may extend it for
migration. Device driver also can make decision to support migration
or not.

It would be great if we could avoid changing the guest; but at least
your guest
driver changes don't actually seem to be that hardware specific;
could your
changes actually be moved to generic PCI level so they could be made
to work for lots of drivers?


It is impossible to use one common solution for all devices unless the
PCIE
spec documents it clearly and i think one day it will be there. But
before
that, we need some workarounds on guest driver to make it work even it
looks
ugly.



Yes, so far there is not hardware migration support and it's hard to modify
bus level code. It also will block implementation on the Windows.


Please don't assume things.  Unless you have hard data from Microsoft
that says they want it this way lets just try to figure out what works
best for us for now and then we can start worrying about third party
implementations after we have figured out a solution that actually
works.

- Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] live migration vs device assignment (motivation)

Reply via email to