Re: live migration vs device assignment (motivation)

Michael S. Tsirkin Thu, 10 Dec 2015 00:39:32 -0800

On Thu, Dec 10, 2015 at 11:04:54AM +0800, Lan, Tianyu wrote:
> 
> On 12/10/2015 4:07 AM, Michael S. Tsirkin wrote:
> >On Thu, Dec 10, 2015 at 12:26:25AM +0800, Lan, Tianyu wrote:
> >>On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote:
> >>>I thought about what this is doing at the high level, and I do have some
> >>>value in what you are trying to do, but I also think we need to clarify
> >>>the motivation a bit more.  What you are saying is not really what the
> >>>patches are doing.
> >>>
> >>>And with that clearer understanding of the motivation in mind (assuming
> >>>it actually captures a real need), I would also like to suggest some
> >>>changes.
> >>
> >>Motivation:
> >>Most current solutions for migration with passthough device are based on
> >>the PCI hotplug but it has side affect and can't work for all device.
> >>
> >>For NIC device:
> >>PCI hotplug solution can work around Network device migration
> >>via switching VF and PF.
> >
> >This is just more confusion. hotplug is just a way to add and remove
> >devices. switching VF and PF is up to guest and hypervisor.
> 
> This is a combination. Because it's not able to migrate device state in
> the current world during migration(What we are doing), Exist solutions
> of migrating VM with passthough NIC relies on the PCI hotplug.


That's where you go wrong I think. This marketing speak about solution
of migrating VM with passthrough is just confusing people.

There's no way to do migration with device passthrough on KVM at the
moment, in particular because of lack of way for host to save and
restore device state, and you do not propose a way either.

So how do people migrate? Stop doing device passthrough.
So what I think your patches do is add ability to do the two things
in parallel: stop doing passthrough and start migration.
You still can not migrate with passthrough.

> Unplug VF
> before starting migration and then switch network from VF NIC to PV NIC
> in order to maintain the network connection.

Again, this is mixing unrelated things.  This switching is not really
related to migration. You can do this at any time for any number of
reasons.  If migration takes a lot of time and if you unplug before
migration, then switching to another interface might make sense.
But it's question of policy.

> Plug VF again after
> migration and then switch from PV back to VF. Bond driver provides a way to
> switch between PV and VF NIC automatically with save IP and MAC and so bond
> driver is more preferred.

Preferred over switching manually? As long as it works well, sure.  But
one can come up with other techniques.  For example, don't switch. Save
ip, mac etc, remove source device and add the destination one.  You were
also complaining that the switch took too long.

> >
> >>But switching network interface will introduce service down time.
> >>
> >>I tested the service down time via putting VF and PV interface
> >>into a bonded interface and ping the bonded interface during plug
> >>and unplug VF.
> >>1) About 100ms when add VF
> >>2) About 30ms when del VF
> >
> >OK and what's the source of the downtime?
> >I'm guessing that's just arp being repopulated.  So simply save and
> >re-populate it.
> >
> >There would be a much cleaner solution.
> >
> >Or maybe there's a timer there that just delays hotplug
> >for no reason. Fix it, everyone will benefit.
> >
> >>It also requires guest to do switch configuration.
> >
> >That's just wrong. if you want a switch, you need to
> >configure a switch.
> 
> I meant the config of switching operation between PV and VF.

I see. So sure, there are many ways to configure networking
on linux. You seem to see this as a downside and so want
to hardcode a single configuration into the driver.

> >
> >>These are hard to
> >>manage and deploy from our customers.
> >
> >So kernel want to remain flexible, and the stack is
> >configurable. Downside: customers need to deploy userspace
> >to configure it. Your solution: a hard-coded configuration
> >within kernel and hypervisor.  Sorry, this makes no sense.
> >If kernel is easier for you to deploy than userspace,
> >you need to rethink your deployment strategy.
> 
> This is one factor.
> 
> >
> >>To maintain PV performance during
> >>migration, host side also needs to assign a VF to PV device. This
> >>affects scalability.
> >
> >No idea what this means.
> >
> >>These factors block SRIOV NIC passthough usage in the cloud service and
> >>OPNFV which require network high performance and stability a lot.
> >
> >Everyone needs performance and scalability.
> >
> >>
> >>For other kind of devices, it's hard to work.
> >>We are also adding migration support for QAT(QuickAssist Technology) device.
> >>
> >>QAT device user case introduction.
> >>Server, networking, big data, and storage applications use QuickAssist
> >>Technology to offload servers from handling compute-intensive operations,
> >>such as:
> >>1) Symmetric cryptography functions including cipher operations and
> >>authentication operations
> >>2) Public key functions including RSA, Diffie-Hellman, and elliptic curve
> >>cryptography
> >>3) Compression and decompression functions including DEFLATE and LZS
> >>
> >>PCI hotplug will not work for such devices during migration and these
> >>operations will fail when unplug device.
> >>
> >>So we are trying implementing a new solution which really migrates
> >>device state to target machine and won't affect user during migration
> >>with low service down time.
> >
> >Let's assume for the sake of the argument that there's a lot going on
> >and removing the device is just too slow (though you should figure out
> >what's going on before giving up and just building something new from
> >scratch).
> 
> No, we can find a PV NIC as backup for VF NIC during migration but it
> doesn't work for other kinds of device since there is no backup for them.
> E,G When migration happens during users compresses files via QAT, it's
> impossible to remove QAT at that point. If do that, the compress operation
> will fail and affect user experience.

I absolutely agree here. Switching to a PV device is just something
people can do. It must not be a requirement. Some people like
doing that though. So it's policy, and should be left to userspace.

> >
> >I still don't think you should be migrating state.  That's just too
> >fragile, and it also means you depend on driver to be nice and shut down
> >device on source, so you can not migrate at will.  Instead, reset device
> >on destination and re-initialize it.
> >
> 
> Yes, saving and restoring device state relies on the driver and so we
> reworks driver and make it more friend to migration.

First, this remains very fragile. At your lab you probably have tens of
NIC devices with exactly the same hardware and firmware version, but out
there you can order a batch of 10 and get 10 slightly different
versions.  Will state saved from one work on the other?  One can be
pretty sure no one will test all possible combinations.

Let's assume you do save state and do have a way to detect
whether state matches a given hardware. For example,
driver could store firmware and hardware versions
in the state, and then on destination, retrieve them
and compare. It will be pretty common that you have a mismatch,
and you must not just fail migration. You need a way to recover,
maybe with more downtime.


Second, you can change the driver but you can not be sure it will have
the chance to run at all. Host overload is a common reason to migrate
out of the host.  You also can not trust guest to do the right thing.
So how long do you want to wait until you decide guest is not
cooperating and kill it?  Most people will probably experiment a bit and
then add a bit of a buffer. This is not robust at all.

Again, maybe you ask driver to save state, and if it does
not respond for a while, then you still migrate,
and driver has to recover on destination.


With the above in mind, you need to support two paths:
1. "good path": driver stores state on source, checks it on destination
   detects a match and restores state into the device
2. "bad path": driver does not store state, or detects a mismatch
   on destination. driver has to assume device was lost,
   and reset it

So what I am saying is, implement bad path first. Then good path
is an optimization - measure whether it's faster, and by how much.

Also, it would be nice if on the bad path there was a way
to switch to another driver entirely, even if that means
a bit more downtime. For example, have a way for driver to
tell Linux it has to re-do probing for the device.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: live migration vs device assignment (motivation)

Reply via email to