-----Original Message----- > Date: Thu, 6 Jul 2017 12:59:04 +0200 > From: Maxime Coquelin <maxime.coque...@redhat.com> > To: Jerin Jacob <jerin.ja...@caviumnetworks.com> > CC: Santosh Shukla <santosh.shu...@caviumnetworks.com>, > tho...@monjalon.net, bruce.richard...@intel.com, dev@dpdk.org, > hemant.agra...@nxp.com, shreyansh.j...@nxp.com, gaetan.ri...@6wind.com > Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode > before mapping > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 > Thunderbird/52.1.0 > > > > On 07/06/2017 11:49 AM, Jerin Jacob wrote: > > -----Original Message----- > > > Date: Thu, 6 Jul 2017 09:58:41 +0200 > > > From: Maxime Coquelin <maxime.coque...@redhat.com> > > > To: Jerin Jacob <jerin.ja...@caviumnetworks.com> > > > CC: Santosh Shukla <santosh.shu...@caviumnetworks.com>, > > > tho...@monjalon.net, bruce.richard...@intel.com, dev@dpdk.org, > > > hemant.agra...@nxp.com, shreyansh.j...@nxp.com, gaetan.ri...@6wind.com > > > Subject: Re: [dpdk-dev] [PATCH 07/10] linuxapp/eal_vfio: honor iova mode > > > before mapping > > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 > > > Thunderbird/52.1.0 > > > > > > > > > > > > > > > > > > > > > > > > On 06/08/2017 01:05 PM, Santosh Shukla wrote: > > > > > > Check iova mode and accordingly map iova to pa or va. > > > > > > > > > > > > Signed-off-by: Santosh Shukla<santosh.shu...@caviumnetworks.com> > > > > > > Signed-off-by: Jerin Jacob<jerin.ja...@caviumnetworks.com> > > > > > > --- > > > > > > lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++-- > > > > > > 1 file changed, 8 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > > > > b/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > > > > index 04914406f..348b7a7f4 100644 > > > > > > --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > > > > +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c > > > > > > @@ -706,7 +706,10 @@ vfio_type1_dma_map(int vfio_container_fd) > > > > > > dma_map.argsz = sizeof(struct > > > > > > vfio_iommu_type1_dma_map); > > > > > > dma_map.vaddr = ms[i].addr_64; > > > > > > dma_map.size = ms[i].len; > > > > > > - dma_map.iova = ms[i].phys_addr; > > > > > > + if (rte_eal_iova_mode() == RTE_IOVA_VA) > > > > > > + dma_map.iova = dma_map.vaddr; > > > > > > + else > > > > > > + dma_map.iova = ms[i].phys_addr; > > > > > > dma_map.flags = VFIO_DMA_MAP_FLAG_READ | > > > > > > VFIO_DMA_MAP_FLAG_WRITE; > > > > > > > > > > IIUC, it is changing default behavior for VFIO devices. > > > > > > > > > > I see a possible problem, but I'm not sure the case is valid. > > > > > > > > > > Imagine you have two devices in the iommu group, and the two devices > > > > > are > > > > > used in separate processes. Each process could try two different > > > > > physical addresses at the same virtual address, and so the second map > > > > > would fail. > > > > > > > > IMO, Doesn't look like a problem. Here is the data flow > > > > > > > > 1) The vfio DMA map function(vfio_type1_dma_map()) will be called only > > > > on primary process > > > > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_vfio.c#n359 > > > > > > > > 2) On secondary process, DPDK rte_eal_huge_page_attach() will make sure > > > > that, the Secondary process has the _same_ virtual address as primary or > > > > exit from on attach. > > > > http://dpdk.org/browse/dpdk/tree/lib/librte_eal/linuxapp/eal/eal_memory.c#n1452 > > > > > > > > 3) Since secondary process adds the mapped the virtual address in step > > > > (2). > > > > in the page table in OS. On SMMU entry miss(When device > > > > request from I/O transaction), OS will load the mapping and update the > > > > SMMU > > > > "context" with page tables from MMU. > > > > > > Ok thanks for the detailed info, but what about the case where the same > > > iommu group is used by two primary processes? > > > > Does that case exist with DPDK? We always need to blacklist same BDF in > > the secondary process to make things work with existing DPDK setup. Which > > make sense as well. Only primary process configures the HW blocks. > > I meant the case when two BDF are in the same IOMMU group (if ACS is not > supported at some point in the hierarchy). And I meant two primary > processes running, like for example two containers running each a DPDK > application. > > Maybe this is not a valid use-case (it is not secure, as it would break > isolation between the two containers), but it seems that it is something > DPDK allows today, if I'm not mistaken.
Not sure. Doesn't seems to valid case with VFIO as without ACS anyway it will break security (the all point of IOMMU protection == VFIO) > > > > > > > I don't know how frequent it is, but if ACS is not supported by either the > > > endpoint or the the root port, then you would have to share the same IOMMU > > > group for all the ports of your card. Right? > > > > ACS is supported in our card(it not in bypass mode) and one mempool PCI BDF > > comes as a IOMMU group. > > > > If it in bypass mode anyway you use in vfio-noiommu mode as > > there is no protection anyway. > > > What concerns me is that we change the default behavior for all other > > > devices. Having an option to override is fine to me, but the default > > > mode should remain the same IMHO. > > > > Doesn't seems to be a technical point. But I agree with your concern. > > we will address it. > > I think, we have two ways to address it. > > > > option 1: > > - In existing patch, > > a) we are currently setting(internal_cfg->iova_mode = RTE_IOVA_PA) > > http://dpdk.org/dev/patchwork/patch/25192 > > b) only when with eal argument sets to RTE_IOVA_VA and then bus probed > > value == RTE_IOVA_VA the final mode will be RTE_IOVA_VA > > http://dpdk.org/dev/patchwork/patch/25193/ > > check the code after rte_bus_scan() > > > > option 2: > > On rte_pci_get_iommu_class() in http://dpdk.org/dev/patchwork/patch/25190/ > > we can check the rte_pci_device.id.vendor_id == CAVIUM to select the > > mode so other type of devices safe. > > > > I think, option 2 makes sense, as it gives foolproof auto detection scheme > > and > > without effecting any other devices that not interested in this scheme > > > > Does that address your concern about the patchset? > > Yes it does, or maybe create a new flag in struct rte_pci_driver's > drv_flags to provide a hint it prefers to use VA as IOVA? > > It, of course, would just be a hint, and should be set only when other > conditions are met. Yes. Makes sense. We will roll out v2 with option2 + your rte_pci_driver suggestion. Thanks a lot for the review. Appreciated.