On Wed, 22 Jan 2025 16:15:03 +0800 Yang Ming <ming.1.y...@nokia-sbell.com> wrote:
> On 2025/1/18 00:47, Stephen Hemminger wrote: > > Caution: This is an external email. Please be very careful when clicking > > links or opening attachments. See http://nok.it/nsb for additional > > information. > > > > On Fri, 17 Jan 2025 15:28:47 +0800 > > Yang Ming <ming.1.y...@nokia-sbell.com> wrote: > > > >> DPDK detect vfio container according the existence of vfio > >> module. But for container with non-privileged mode, there is > >> possibility that no VFIO_DIR(/dev/vfio) mapping from host to > >> container when host have both Intel NIC and Mellanox NIC but > >> this conntainer only allocate VFs from Mellanox NIC. > >> In this case, vfio kernel module has already been loaded from > >> the host. > >> This scenario will cause the error log occurs in DPDK primary > >> process as below: > >> 'EAL: cannot open VFIO container, error 2 (No such file or > >> directory)' > >> 'EAL: VFIO support could not be initialized' > >> Because `rte_vfio_enable()` call `rte_vfio_get_container_fd()` > >> to execute `vfio_container_fd = open(VFIO_CONTAINER_PATH, > >> O_RDWR);` but VFIO_CONTAINER_PATH(/dev/vfio/vfio) doesn't exist > >> in this container. > >> This scenario will also lead to the delay of DPDK secondary > >> process because `default_vfio_cfg->vfio_enabled = 0` and > >> `default_vfio_cfg->vfio_container_fd = -1`, socket error will > >> be set in DPDK primary process when it sync this info to > >> the secondary process. > >> This patch use to skip this kind of useless detection for this > >> scenario. > >> > >> Signed-off-by: Yang Ming <ming.1.y...@nokia-sbell.com> > >> --- > >> lib/eal/linux/eal_vfio.c | 11 +++++++++++ > >> 1 file changed, 11 insertions(+) > >> > >> diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c > >> index 7132e24cba..1679d29263 100644 > >> --- a/lib/eal/linux/eal_vfio.c > >> +++ b/lib/eal/linux/eal_vfio.c > >> @@ -7,6 +7,7 @@ > >> #include <fcntl.h> > >> #include <unistd.h> > >> #include <sys/ioctl.h> > >> +#include <dirent.h> > >> > >> #include <rte_errno.h> > >> #include <rte_log.h> > >> @@ -1083,6 +1084,7 @@ rte_vfio_enable(const char *modname) > >> /* initialize group list */ > >> int i, j; > >> int vfio_available; > >> + DIR *dir; > >> const struct internal_config *internal_conf = > >> eal_get_internal_configuration(); > >> > >> @@ -1119,6 +1121,15 @@ rte_vfio_enable(const char *modname) > >> return 0; > >> } > >> > >> + /* return 0 if VFIO directory not exist for container with > >> non-privileged mode */ > >> + dir = opendir(VFIO_DIR); > >> + if (dir == NULL) { > >> + EAL_LOG(DEBUG, > >> + "VFIO directory not exist, skipping VFIO support..."); > >> + return 0; > >> + } > >> + closedir(dir); > > You need to test the non-container cases. > > If vfio is loaded /dev/vfio is a character device (not a directory) > > > > Also looks suspicious that VFIO_DIR is defined but never used currently. > > > Hi Stephen, > For non-container test, /dev/vfio/vfio will be character device, not > /dev/vfio. > Here is the command result on my testing environment with Intel NIC. > > [root@computer-1 testuser]# ls -l /dev/vfio > total 0 > crw-rw-rw-. 1 root root 10, 196 Jan 22 01:50 vfio > [root@computer-1 testuser]# dpdk-devbind.py -b vfio-pci 0000:04:10.2 > [root@computer-1 testuser]# ls -l /dev/vfio > total 0 > crw-------. 1 root root 239, 0 Jan 22 01:52 59 > crw-rw-rw-. 1 root root 10, 196 Jan 22 01:50 vfio > [root@computer-1 testuser]# dpdk-devbind.py -b ixgbevf 0000:04:10.2 > [root@computer-1 testuser]# ls -l /dev/vfio > total 0 > crw-rw-rw-. 1 root root 10, 196 Jan 22 01:50 vfio > > Can you confirm your test scenario? > > When vfio-pci is loaded but no device bound: $ ls -l /dev/vfio total 0 crw-rw-rw- 1 root root 10, 196 Feb 26 05:39 vfio After binding device $ ls -l /dev/vfio total 0 crw------- 1 root root 511, 0 Feb 26 05:42 15 crw-rw-rw- 1 root root 10, 196 Feb 26 05:39 vfio So testing for /dev/vfio is good indication that module is loaded. Not sure what I was thinking earlier.