Hi Anatoly,

> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Thursday, April 12, 2018 10:04 PM
> To: Wang, Xiao W <xiao.w.w...@intel.com>; Yigit, Ferruh
> <ferruh.yi...@intel.com>
> Cc: dev@dpdk.org; maxime.coque...@redhat.com; Wang, Zhihong
> <zhihong.w...@intel.com>; Bie, Tiwei <tiwei....@intel.com>; Tan, Jianfeng
> <jianfeng....@intel.com>; Liang, Cunming <cunming.li...@intel.com>; Daly,
> Dan <dan.d...@intel.com>; tho...@monjalon.net; gaetan.ri...@6wind.com;
> hemant.agra...@nxp.com; Chen, Junjie J <junjie.j.c...@intel.com>
> Subject: Re: [PATCH v6 1/4] eal/vfio: add multiple container support
> 
> On 12-Apr-18 8:19 AM, Xiao Wang wrote:
> > Currently eal vfio framework binds vfio group fd to the default
> > container fd during rte_vfio_setup_device, while in some cases,
> > e.g. vDPA (vhost data path acceleration), we want to put vfio group
> > to a separate container and program IOMMU via this container.
> >
> > This patch adds some APIs to support container creating and device
> > binding with a container.
> >
> > A driver could use "rte_vfio_create_container" helper to create a
> > new container from eal, use "rte_vfio_bind_group" to bind a device
> > to the newly created container.
> >
> > During rte_vfio_setup_device, the container bound with the device
> > will be used for IOMMU setup.
> >
> > Signed-off-by: Junjie Chen <junjie.j.c...@intel.com>
> > Signed-off-by: Xiao Wang <xiao.w.w...@intel.com>
> > Reviewed-by: Maxime Coquelin <maxime.coque...@redhat.com>
> > Reviewed-by: Ferruh Yigit <ferruh.yi...@intel.com>
> > ---
> 
> Apologies for late review. Some comments below.
> 
> <...>
> 
> >
> > +struct rte_memseg;
> > +
> >   /**
> >    * Setup vfio_cfg for the device identified by its address.
> >    * It discovers the configured I/O MMU groups or sets a new one for the
> device.
> > @@ -131,6 +133,117 @@ rte_vfio_clear_group(int vfio_group_fd);
> >   }
> >   #endif
> >
> 
> <...>
> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Perform dma mapping for devices in a conainer.
> > + *
> > + * @param container_fd
> > + *   the specified container fd
> > + *
> > + * @param dma_type
> > + *   the dma map type
> > + *
> > + * @param ms
> > + *   the dma address region to map
> > + *
> > + * @return
> > + *    0 if successful
> > + *   <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_dma_map(int container_fd, int dma_type, const struct
> rte_memseg *ms);
> > +
> 
> First of all, why memseg, instead of va/iova/len? This seems like
> unnecessary attachment to internals of DPDK memory representation. Not
> all memory comes in memsegs, this makes the API unnecessarily specific
> to DPDK memory.

Agree, will use va/iova/len.

> 
> Also, why providing DMA type? There's already a VFIO type pointer in
> vfio_config - you can set this pointer for every new created container,
> so the user wouldn't have to care about IOMMU type. Is it not possible
> to figure out DMA type from within EAL VFIO? If not, maybe provide an
> API to do so, e.g. rte_vfio_container_set_dma_type()?

It's possible, EAL VFIO should be able to figure out a container's DMA type.

> 
> This will also need to be rebased on top of latest HEAD because there
> already is a similar DMA map/unmap API added, only without the container
> parameter. Perhaps rename these new functions to
> rte_vfio_container_(create|destroy|dma_map|dma_unmap)?

OK, will check the latest HEAD and rebase on that.

> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice
> > + *
> > + * Perform dma unmapping for devices in a conainer.
> > + *
> > + * @param container_fd
> > + *   the specified container fd
> > + *
> > + * @param dma_type
> > + *    the dma map type
> > + *
> > + * @param ms
> > + *   the dma address region to unmap
> > + *
> > + * @return
> > + *    0 if successful
> > + *   <0 if failed
> > + */
> > +int __rte_experimental
> > +rte_vfio_dma_unmap(int container_fd, int dma_type, const struct
> rte_memseg *ms);
> > +
> >   #endif /* VFIO_PRESENT */
> >
> 
> <...>
> 
> > @@ -75,8 +53,8 @@ vfio_get_group_fd(int iommu_group_no)
> >             if (vfio_group_fd < 0) {
> >                     /* if file not found, it's not an error */
> >                     if (errno != ENOENT) {
> > -                           RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> filename,
> > -                                           strerror(errno));
> > +                           RTE_LOG(ERR, EAL, "Cannot open %s: %s\n",
> > +                                   filename, strerror(errno));
> 
> This looks like unintended change.
> 
> >                             return -1;
> >                     }
> >
> > @@ -86,8 +64,10 @@ vfio_get_group_fd(int iommu_group_no)
> >                     vfio_group_fd = open(filename, O_RDWR);
> >                     if (vfio_group_fd < 0) {
> >                             if (errno != ENOENT) {
> > -                                   RTE_LOG(ERR, EAL, "Cannot
> open %s: %s\n", filename,
> > -                                                   strerror(errno));
> > +                                   RTE_LOG(ERR, EAL,
> > +                                           "Cannot open %s: %s\n",
> > +                                           filename,
> > +                                           strerror(errno));
> 
> This looks like unintended change.
> 
> >                                     return -1;
> >                             }
> >                             return 0;
> > @@ -95,21 +75,19 @@ vfio_get_group_fd(int iommu_group_no)
> >                     /* noiommu group found */
> >             }
> >
> > -           cur_grp->group_no = iommu_group_no;
> > -           cur_grp->fd = vfio_group_fd;
> > -           vfio_cfg.vfio_active_groups++;
> >             return vfio_group_fd;
> >     }
> > -   /* if we're in a secondary process, request group fd from the primary
> > +   /*
> > +    * if we're in a secondary process, request group fd from the primary
> >      * process via our socket
> >      */
> 
> This looks like unintended change.
> 
> >     else {
> > -           int socket_fd, ret;
> > -
> > -           socket_fd = vfio_mp_sync_connect_to_primary();
> > +           int ret;
> > +           int socket_fd = vfio_mp_sync_connect_to_primary();
> >
> >             if (socket_fd < 0) {
> > -                   RTE_LOG(ERR, EAL, "  cannot connect to primary
> process!\n");
> > +                   RTE_LOG(ERR, EAL,
> > +                           "  cannot connect to primary process!\n");
> 
> This looks like unintended change.
> 
> >                     return -1;
> >             }
> >             if (vfio_mp_sync_send_request(socket_fd,
> SOCKET_REQ_GROUP) < 0) {
> > @@ -122,6 +100,7 @@ vfio_get_group_fd(int iommu_group_no)
> >                     close(socket_fd);
> >                     return -1;
> >             }
> > +
> >             ret = vfio_mp_sync_receive_request(socket_fd);
> 
> This looks like unintended change.
> 
> (hint: "git revert -n HEAD && git add -p" is your friend :) )

Thanks, will remove these diff.

> 
> >             switch (ret) {
> >             case SOCKET_NO_FD:
> > @@ -132,9 +111,6 @@ vfio_get_group_fd(int iommu_group_no)
> >                     /* if we got the fd, store it and return it */
> >                     if (vfio_group_fd > 0) {
> >                             close(socket_fd);
> > -                           cur_grp->group_no = iommu_group_no;
> > -                           cur_grp->fd = vfio_group_fd;
> > -                           vfio_cfg.vfio_active_groups++;
> >                             return vfio_group_fd;
> >                     }
> >                     /* fall-through on error */
> > @@ -147,70 +123,349 @@ vfio_get_group_fd(int iommu_group_no)
> >     return -1;
> 
> <...>
> 
> > +int __rte_experimental
> > +rte_vfio_create_container(void)
> > +{
> > +   struct vfio_config *vfio_cfg;
> > +   int i;
> > +
> > +   /* Find an empty slot to store new vfio config */
> > +   for (i = 1; i < VFIO_MAX_CONTAINERS; i++) {
> > +           if (vfio_cfgs[i] == NULL)
> > +                   break;
> > +   }
> > +
> > +   if (i == VFIO_MAX_CONTAINERS) {
> > +           RTE_LOG(ERR, EAL, "exceed max vfio container limit\n");
> > +           return -1;
> > +   }
> > +
> > +   vfio_cfgs[i] = rte_zmalloc("vfio_container", sizeof(struct vfio_config),
> > +           RTE_CACHE_LINE_SIZE);
> > +   if (vfio_cfgs[i] == NULL)
> > +           return -ENOMEM;
> 
> Is there a specific reason why 1) dynamic allocation is used (as opposed
> to just storing a static array), and 2) DPDK memory allocation is used?
> This seems like unnecessary complication.
> 
> Even if you were to decide to allocate memory instead of having a static
> array, you'll have to register for rte_eal_cleanup() to delete any
> allocated containers on DPDK exit. But, as i said, i think it would be
> better to keep it as static array.
>

Thanks for the suggestion, static array looks simpler and cleaner.
 
> > +
> > +   RTE_LOG(INFO, EAL, "alloc container at slot %d\n", i);
> > +   vfio_cfg = vfio_cfgs[i];
> > +   vfio_cfg->vfio_active_groups = 0;
> > +   vfio_cfg->vfio_container_fd = vfio_get_container_fd();
> > +
> > +   if (vfio_cfg->vfio_container_fd < 0) {
> > +           rte_free(vfio_cfgs[i]);
> > +           vfio_cfgs[i] = NULL;
> > +           return -1;
> > +   }
> > +
> > +   for (i = 0; i < VFIO_MAX_GROUPS; i++) {
> > +           vfio_cfg->vfio_groups[i].group_no = -1;
> > +           vfio_cfg->vfio_groups[i].fd = -1;
> > +           vfio_cfg->vfio_groups[i].devices = 0;
> > +   }
> 
> <...>
> 
> > @@ -665,41 +931,80 @@ vfio_get_group_no(const char *sysfs_base,
> >   }
> >
> >   static int
> > -vfio_type1_dma_map(int vfio_container_fd)
> > +do_vfio_type1_dma_map(int vfio_container_fd, const struct rte_memseg
> *ms)
> 
> <...>
> 
> 
> > +static int
> > +do_vfio_type1_dma_unmap(int vfio_container_fd, const struct
> rte_memseg *ms)
> 
> API's such as these two were recently added to DPDK.

Will check and rebase.

BRs,
Xiao

> 
> --
> Thanks,
> Anatoly

Reply via email to