On Tue, Feb 21, 2017 at 01:41:31PM +1100, Alexey Kardashevskiy wrote:
> On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in
> our case), a device needs an iommu_table pointer set via
> set_iommu_table_base().
> 
> The codeflow is:
> - pnv_pci_ioda2_setup_dma_pe()
>       - pnv_pci_ioda2_setup_default_config()
>       - pnv_ioda_setup_bus_dma() [1]
> 
> pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups,
> pnv_pci_ioda2_setup_default_config() does default DMA setup,
> pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function
> PEs as bus PEs except NPU), walks through all underlying buses and
> devices, adds all devices to an IOMMU group and sets iommu_table.
> 
> On IODA2, when VFIO is used, it takes ownership over a PE which means it
> removes all tables and creates new ones (with a possibility of sharing
> them among PEs). So when the ownership is returned from VFIO to
> the kernel, the iommu_table pointer written to a device at [1] is
> stale and needs an update.
> 
> This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma()
> (in fact re-adds as it used to be there a while ago for different
> reasons) to tell the helper if a device needs to be added to
> an IOMMU group with an iommu_table update or just the latter.
> 
> This calls pnv_ioda_setup_bus_dma(..., false) from
> pnv_ioda2_release_ownership() so when the ownership is restored,
> 32bit DMA can work again for a device. This does the same thing
> on obtaining ownership as the iommu_table point is stale at this point
> anyway and it is safer to have NULL there.
> 
> We did not hit this earlier as all tested devices in recent years were
> only using 64bit DMA; the rare exception for this is MPT3 SAS adapter
> which uses both 32bit and 64bit DMA access and it has not been tested
> with VFIO much.
> 
> Cc: Gavin Shan <gws...@linux.vnet.ibm.com>
> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>

Reviewed-by: David Gibson <da...@gibson.dropbear.id.au>

> ---
> 
> If this is applied before "powerpc/powernv/npu: Remove dead iommu code",
> there will be a minor conflict.
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 51ec0dc1dfde..f5a2421bf164 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1774,17 +1774,20 @@ static u64 pnv_pci_ioda_dma_get_required_mask(struct 
> pci_dev *pdev)
>  }
>  
>  static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,
> -                                struct pci_bus *bus)
> +                                struct pci_bus *bus,
> +                                bool add_to_group)
>  {
>       struct pci_dev *dev;
>  
>       list_for_each_entry(dev, &bus->devices, bus_list) {
>               set_iommu_table_base(&dev->dev, pe->table_group.tables[0]);
>               set_dma_offset(&dev->dev, pe->tce_bypass_base);
> -             iommu_add_device(&dev->dev);
> +             if (add_to_group)
> +                     iommu_add_device(&dev->dev);
>  
>               if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
> -                     pnv_ioda_setup_bus_dma(pe, dev->subordinate);
> +                     pnv_ioda_setup_bus_dma(pe, dev->subordinate,
> +                                     add_to_group);
>       }
>  }
>  
> @@ -2190,7 +2193,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
> *phb,
>               set_iommu_table_base(&pe->pdev->dev, tbl);
>               iommu_add_device(&pe->pdev->dev);
>       } else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
> -             pnv_ioda_setup_bus_dma(pe, pe->pbus);
> +             pnv_ioda_setup_bus_dma(pe, pe->pbus, true);
>  
>       return;
>   fail:
> @@ -2425,6 +2428,8 @@ static void pnv_ioda2_take_ownership(struct 
> iommu_table_group *table_group)
>  
>       pnv_pci_ioda2_set_bypass(pe, false);
>       pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> +     if (pe->pbus)
> +             pnv_ioda_setup_bus_dma(pe, pe->pbus, false);
>       pnv_ioda2_table_free(tbl);
>  }
>  
> @@ -2434,6 +2439,8 @@ static void pnv_ioda2_release_ownership(struct 
> iommu_table_group *table_group)
>                                               table_group);
>  
>       pnv_pci_ioda2_setup_default_config(pe);
> +     if (pe->pbus)
> +             pnv_ioda_setup_bus_dma(pe, pe->pbus, false);
>  }
>  
>  static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
> @@ -2725,7 +2732,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
> *phb,
>               return;
>  
>       if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
> -             pnv_ioda_setup_bus_dma(pe, pe->pbus);
> +             pnv_ioda_setup_bus_dma(pe, pe->pbus, true);
>  }
>  
>  #ifdef CONFIG_PCI_MSI

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature

Reply via email to