On 11/07/2019 18:19, Will Deacon wrote:
Hi everyone,

This is a significant rework of the RFC I previously posted here:

  https://lkml.kernel.org/r/[email protected]

But this time, it looks like it might actually be worthwhile according
to my perf profiles, where __iommu_unmap() falls a long way down the
profile for a multi-threaded netperf run. I'm still relying on others to
confirm this is useful, however.

Some of the changes since last time are:

  * Support for constructing and submitting a list of commands in the
    driver

  * Numerous changes to the IOMMU and io-pgtable APIs so that we can
    submit commands in batches

  * Removal of cmpxchg() from cmdq_shared_lock() fast-path

  * Code restructuring and cleanups

This current applies against my iommu/devel branch that Joerg has pulled
for 5.3. If you want to test it out, I've put everything here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq

Feedback welcome. I appreciate that we're in the merge window, but I
wanted to get this on the list for people to look at as an RFC.


I tested storage performance on this series, which I think is a better scenario to test than network performance, that being generally limited by the network link speed.

Results:

Baseline performance (will/iommu/devel, commit 9e6ea59f3)
8x SAS disks D05        839K IOPS
1x NVMe D05             454K IOPS
1x NVMe D06             442k IOPS

Patchset performance (will/iommu/cmdq)
8x SAS disk D05         835K IOPS
1x NVMe D05             472K IOPS
1x NVMe D06             459k IOPS

So we see a bit of an NVMe boost, but about the same for 8x disks. No iommu performance is about 918K IOPs for 8x disks, so it is not limited by the medium.

The D06 is a bit memory starved, so that may account for generally lower NVMe performance.

John

Cheers,

Will

--->8

Cc: Jean-Philippe Brucker <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Jayachandran Chandrasekharan Nair <[email protected]>
Cc: Jan Glauber <[email protected]>
Cc: Jon Masters <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Zhen Lei <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Vijay Kilary <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: John Garry <[email protected]>
Cc: Alex Williamson <[email protected]>

Will Deacon (19):
  iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops
  iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()
  iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops
  iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes
  iommu: Introduce iommu_iotlb_gather_add_page()
  iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync()
  iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf()
  iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in
    drivers
  iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf()
  iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page()
  iommu/io-pgtable: Remove unused ->tlb_sync() callback
  iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap()
  iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page()
  iommu/arm-smmu-v3: Separate s/w and h/w views of prod and cons indexes
  iommu/arm-smmu-v3: Drop unused 'q' argument from Q_OVF macro
  iommu/arm-smmu-v3: Move low-level queue fields out of arm_smmu_queue
  iommu/arm-smmu-v3: Operate directly on low-level queue where possible
  iommu/arm-smmu-v3: Reduce contention during command-queue insertion
  iommu/arm-smmu-v3: Defer TLB invalidation until ->iotlb_sync()

 drivers/gpu/drm/panfrost/panfrost_mmu.c |  24 +-
 drivers/iommu/amd_iommu.c               |  11 +-
 drivers/iommu/arm-smmu-v3.c             | 856 ++++++++++++++++++++++++--------
 drivers/iommu/arm-smmu.c                | 103 +++-
 drivers/iommu/dma-iommu.c               |   9 +-
 drivers/iommu/exynos-iommu.c            |   3 +-
 drivers/iommu/intel-iommu.c             |   3 +-
 drivers/iommu/io-pgtable-arm-v7s.c      |  57 +--
 drivers/iommu/io-pgtable-arm.c          |  48 +-
 drivers/iommu/iommu.c                   |  24 +-
 drivers/iommu/ipmmu-vmsa.c              |  28 +-
 drivers/iommu/msm_iommu.c               |  42 +-
 drivers/iommu/mtk_iommu.c               |  45 +-
 drivers/iommu/mtk_iommu_v1.c            |   3 +-
 drivers/iommu/omap-iommu.c              |   2 +-
 drivers/iommu/qcom_iommu.c              |  44 +-
 drivers/iommu/rockchip-iommu.c          |   2 +-
 drivers/iommu/s390-iommu.c              |   3 +-
 drivers/iommu/tegra-gart.c              |  12 +-
 drivers/iommu/tegra-smmu.c              |   2 +-
 drivers/vfio/vfio_iommu_type1.c         |  27 +-
 include/linux/io-pgtable.h              |  57 ++-
 include/linux/iommu.h                   |  92 +++-
 23 files changed, 1090 insertions(+), 407 deletions(-)



_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to