On Wed, 21 Mar 2018 15:46:01 +0000 Ciprian Barbu <ciprian.ba...@enea.com> wrote:
> Hello, > > In the context of running Openstack on a cluster of Cavium ThunderX cn8890 > aarch64 servers, we are trying to attach virtual functions to a VM. > > First some introduction. This Cavium SoC has a different approach to Virtual > Functions than on x86 NICs, in which VFs are always enabled and there are two > types of VFs and *one single* PF, as follows: > - primary VFs - these are in fact assigned by the system to the physical > ports of the server, e.g em2p1s0f1, em2p1s0f3 etc below. > - secondary VFs - the main purpose of these is to provide additional HW > queues under SW control (usually DPDK applications) by automatically binding > them to the needed physical port. > - one single "physical" function, device 0002:01:00.0 below, which to the > best of my knowledge acts merely as a stub and cannot be assigned an > interface name. > > Below is the output of "dpdk-devbind.py -s" which provides some useful > information. > > Network devices using DPDK-compatible driver > ============================================ > 0002:01:00.2 'Device a034' drv=vfio-pci unused=nicvf > > Network devices using kernel driver > =================================== > 0000:01:10.0 'THUNDERX BGX (Common Ethernet Interface)' if= drv=thunder-BGX > unused=thunder_bgx,vfio-pci > 0000:01:10.1 'THUNDERX BGX (Common Ethernet Interface)' if= drv=thunder-BGX > unused=thunder_bgx,vfio-pci > 0002:01:00.0 'THUNDERX Network Interface Controller' if= drv=thunder-nic > unused=nicpf,vfio-pci > 0002:01:00.1 'Device a034' if=em2p1s0f1 drv=thunder-nicvf > unused=nicvf,vfio-pci > 0002:01:00.3 'Device a034' if=em2p1s0f3 drv=thunder-nicvf > unused=nicvf,vfio-pci > 0002:01:00.4 'Device a034' if=em2p1s0f4 drv=thunder-nicvf > unused=nicvf,vfio-pci > 0002:01:00.5 'Device a034' if=em2p1s0f5 drv=thunder-nicvf > unused=nicvf,vfio-pci > 0002:01:00.6 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci > 0002:01:00.7 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci > 0002:01:01.0 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci > > Now for the problem. I don't have a domain definition because libvirt fails > to start a domain, but I might be able to find what nova generates. But what > it tries to do is passthrough em2p1s0f3, address 0002:01:00.3: > <interface type='hostdev' managed='yes'> > <source> > <address type='pci' domain='0x0002' bus='0x1' slot='0x0' function='0x3'/> > </source> > </interface> When you use an <interface> definition, I believe libvirt is interpreting this specifically as a network device and perhaps expects to find an interface on the pf through which it can do setup. You can also specify assigned devices via a <hostdev> entry, such as: <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address type='pci' domain='0x0002' bus='0x1' slot='0x0' function='0x3'/> </source> </hostdev> In which case libvirt shouldn't care that the device is a VF and should have no dependency on a PF interface (or ability to configure the VF via the PF), I think. Cc'ing libvirt experts. There's a proposed stub driver in the upstream kernel that would also act in a similar fashion, the host PF driver is nothing more than a stub that enables the VFs, so libvirt would need to handle those VFs in a way that has no dependency on the PF being a network interface, or any other sort of interface. Thanks, Alex > You can find attached a trimmed libvirtd.log where the main error is: > 43236: error : virPCIGetVirtualFunctionInfo:2927 : internal error: The PF > device for VF /sys/bus/pci/devices/0002:01:00.3 has no network device name > > I have actually spent a few days trying to do some hacks and learn some more. > The main idea is that virPCIGetVirtualFunctionInfo fails to find the physical > name for the virtual device at address 0002:01:00.3, which as I explained in > the introduction is something that this Cavium SoC does not do. > > Looking further down the stream, almost all of the helper functions need a > linkdev for the physical function, which means that making libvirt work on > this system means some heavy refactoring, a solution being to use the sysfs > path rather than the interface name. > This will not work 100% from what I've seen, at least virNetDevGetVfConfig > uses netlink to save the admin MAC (part of virNetDevSaveNetConfig), and > netlink needs the ifname. > > So I'm quite stuck on finding a workaround/fix for this platform which would > potentially be something upstreamable, so that we, ENEA, don't burden with > maintaining an ugly hack. Right now we are using libvirt 3.5.0 but we can > upgrade to something newer if need. > > The question(s) thus, are > 1. is this problem known in the libvirt community? > 2. Is there any plan to make it work? > 3. Can you give some pointers on an approach to adapt libvirt to this system? > 4. Maybe it's worth changing the kernel to assign a sort of dummy interface > to the physical function? > > Thanks and sorry for the long email, > /Ciprian -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list