Re: [RFC] Dynamic creation of VFs in a network definition containing an SRIOV device

Laine Stump Tue, 28 Jul 2020 19:55:18 -0700

On 7/28/20 4:46 PM, Daniel Henrique Barboza wrote:

On 7/28/20 12:03 PM, Paulo de Rezende Pinatti wrote:
Context:
Libvirt can already detect the active VFs of an SRIOV PF devicespecified in a network definition and automatically assign these VFsto guests via an <interface> entry referring to that network in thedomain definition. This functionality, however, depends on the systemadministrator having activated in advance the desired number of VFsoutside of libvirt (either manually or through system scripts).
It would be more convenient if the VFs activation could also bemanaged inside libvirt so that the whole management of the VF pool isdone exclusively by libvirt and in only one place (the networkdefinition) rather than spread in different components of the system.
Proposal:
We can extend the existing network definition by adding a new tag<vf> as a child of the tag <pf> in order to allow the user to specifyhow many VFs they wish to have activated for the corresponding SRIOVdevice when the network is started. That would look like the following:
<network>
    <name>sriov-pool</name>
    <forward mode='hostdev' managed='yes'>
      <pf dev='eth1'>
         <vf num='10'/>
      </pf>
    </forward>
</network>
At xml definition time nothing gets changed on the system, as it istoday. When the network is started with 'virth net-start sriov-pool'then libvirt will activate the desired number of VFs as specified inthe tag <vf> of the network definition.
The operation might require resetting 'sriov_numvfs' to zero first incase the number of VFs currently active differs from the desired value.

You don't specifically say it here, but any time sriov_numvfs is changed(and it must be changed by first setting it to 0, then back to the newnumber), *all* existing VFs are destroyed, and then recreated. And whenit is recreated, it is a completely new device, and any previous use ofthe device will be disrupted/forgotten/whatever - the exact behavior ofany user of any of the previously existing devices is undefined, but itcertainly will no longer work, and will be unrecoverable withoutstarting over from scratch.

This means that any sort of API that can change sriov_numvfs has thepotential to seriously mess up anything using the VFs, and so must takeextra care to not do anything unless there's no possibility of thathappening. Note that SR-IOV VFs aren't just used for assigning to guestswith vfio. They can also be used for macvtap pass-through mode, and nowfor vdpa, and possibly/probably other things.

In order to avoid the situation where the user tries to start thenetwork when a VF is already assigned to a running guest, theimplementation will have to ensure all existing VFs of the target PFare not in use, otherwise VFs would be inadvertently hot-unpluggedfrom guests upon network start. In such cases, trying to start thenetwork will then result in an error.
I'm not sure about the "echo 0 > sriov_numvfs' part. It works likethat for MellanoxCX-4 and CX-5 cards but I can't say it works like that for every otherSR-IOV card out
there.

It works that way for every SR-IOV card I've ever seen. If it isn'twritten in a standards document somewhere, it is at least a defactostandard.

Sooner enough, we'll have to handle specific behavior for the cards tocreate
the VFs.

If you're wondering if different cards create their VFs in differentways - at a lower level that is possibly the case. I know that in thepast (before the sriov_totalvfs / sriov_numvfs sysfs interface existed)the way to create a certain number of VFs was to add options to the PFdriver options, and the exact options were different for each vendor.The sysfs interface was at least partly intended to remedy thatdiscrepancy between drivers.

Perhaps Laine can comment on this.
About the whole idea, it kind of changes the design of this networkpool. As it is today,at least from my reading of [1], Libvirt will use any available VFfrom the pool and allocate itto the guest, coping with the existing host VF settings. Using thisnew option, Libvirt is nowsetting the VFs to a specific number, which might as well be less thanthe actual setting,
disrupting the host for no apparent reason.

I would be on board with this idea if:
1 - The attribute is changed to "minimal VFs required for this pool"rather than "change the hostto match this VF number". This means that we wouldn't tamper with thecreated VFs if the hostalready has more VFs that specified. In your example up there, setting10 VFs, what if the hosthas 20 VFs? Why should Libvirt care about taking down 10 VFs that itwouldn't use in the
first place?
2 - we find a universal way (or as much closer as universal) to handlethe creation of VFs.



Writing to sriov_numvfs is afaik, the universal interface to create VFs.

3 - we guarantee that the process of VF creation, which will take downall existing VFs incase of CX-5 cards with echo 0 > numvfs for example, wouldn't disruptthe host in any
way.



Definitely this would be a prerequisite to anything.

(1) is an easier sell. Rename the attribute to "vf minimalNum" orsomething like that, thenrefuse to net-start if the host has less than the set amount of VFschecking sriov_numvfs.Start the network if sriov_numvfs >= minimal. This would bringimmediate value to the existingdesign, allowing the user to specify the minimal amount of VFs theuser intends to
consume from the pool.

(2) and (3) are more complicated. Specially (2).

A very long time ago this feature was discussed, and we decided that,since many users of VFs were doing so via <interface type='hostdev'>directly (managing the pool of VFs themselves rather than using thelibvirt network driver), that if we were going to have the functionalityto create new VF devices, that functionality would be useless to those"many users" if it was done by the network driver. Instead, we figuredit would be more appropriate to implement it in the node-device driver,which already has an API to create and destroy devices. This way itwould be of use to all those people using <interface type='hostdev'>(e.g. all OpenStack users). The only problem is that the node-devicedriver at the time had no concept of persistent configuration (whichwould enable it to re-create the VFs at each host boot), so it would endup just being a thin wrapper over "echo 10 >/sys/.../sriov_numvfs" thatwould still need to be inserted into a host system startup filesomewhere. Because of that, any implementation of the functionality wasdeferred until the node device driver had persistent configuration, andbecause the workaround is so trivial (add a single line to a shellscript somewhere), the need for this feature didn't raise the priorityof enhancing the node device driver in order to support it at all.

Thanks,


DHB
[1]https://wiki.libvirt.org/page/Networking#Assignment_from_a_pool_of_SRIOV_VFs_in_a_libvirt_.3Cnetwork.3E_definition
Stopping the network with 'virsh net-destroy' will cause all VFs tobe removed.



That is very dangerous and would need several checks before allowing it.

Similarly to when starting the network, the implementation will alsoneed to verify for running guests in order to prevent inadvertenthot-unplugging.
Is the functionality proposed above desirable?

In the end, I'd say I'm at best "ambivalent" about doing this. I thinkit would be better if we could do it via the node-device driver so thateveryone could take advantage of it. On the other hand I do alsounderstand that is a much more difficult proposition, and likely to notget implemented, and that it would be nice if the creation of VFs werehandled "somehow" by libvirt. (BTW, if all users of VFs did so via alibvirt network, then I would probably 100% agree with your proposedimplementation. From what I've heard, it's been less common than Ienvisioned when I implemented it though.)

Re: [RFC] Dynamic creation of VFs in a network definition containing an SRIOV device

Reply via email to