On 11/07/2019 18:19, Will Deacon wrote:
Hi everyone,
This is a significant rework of the RFC I previously posted here:
https://lkml.kernel.org/r/[email protected]
But this time, it looks like it might actually be worthwhile according
to my perf profiles, where __iommu_unmap() falls a long way down the
profile for a multi-threaded netperf run. I'm still relying on others to
confirm this is useful, however.
Some of the changes since last time are:
* Support for constructing and submitting a list of commands in the
driver
* Numerous changes to the IOMMU and io-pgtable APIs so that we can
submit commands in batches
* Removal of cmpxchg() from cmdq_shared_lock() fast-path
* Code restructuring and cleanups
This current applies against my iommu/devel branch that Joerg has pulled
for 5.3. If you want to test it out, I've put everything here:
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq
Feedback welcome. I appreciate that we're in the merge window, but I
wanted to get this on the list for people to look at as an RFC.
I tested storage performance on this series, which I think is a better
scenario to test than network performance, that being generally limited
by the network link speed.
Results:
Baseline performance (will/iommu/devel, commit 9e6ea59f3)
8x SAS disks D05 839K IOPS
1x NVMe D05 454K IOPS
1x NVMe D06 442k IOPS
Patchset performance (will/iommu/cmdq)
8x SAS disk D05 835K IOPS
1x NVMe D05 472K IOPS
1x NVMe D06 459k IOPS
So we see a bit of an NVMe boost, but about the same for 8x disks. No
iommu performance is about 918K IOPs for 8x disks, so it is not limited
by the medium.
The D06 is a bit memory starved, so that may account for generally lower
NVMe performance.
John
Cheers,
Will
--->8
Cc: Jean-Philippe Brucker <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Jayachandran Chandrasekharan Nair <[email protected]>
Cc: Jan Glauber <[email protected]>
Cc: Jon Masters <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Zhen Lei <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Vijay Kilary <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: John Garry <[email protected]>
Cc: Alex Williamson <[email protected]>
Will Deacon (19):
iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops
iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()
iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops
iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes
iommu: Introduce iommu_iotlb_gather_add_page()
iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync()
iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf()
iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in
drivers
iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf()
iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page()
iommu/io-pgtable: Remove unused ->tlb_sync() callback
iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap()
iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page()
iommu/arm-smmu-v3: Separate s/w and h/w views of prod and cons indexes
iommu/arm-smmu-v3: Drop unused 'q' argument from Q_OVF macro
iommu/arm-smmu-v3: Move low-level queue fields out of arm_smmu_queue
iommu/arm-smmu-v3: Operate directly on low-level queue where possible
iommu/arm-smmu-v3: Reduce contention during command-queue insertion
iommu/arm-smmu-v3: Defer TLB invalidation until ->iotlb_sync()
drivers/gpu/drm/panfrost/panfrost_mmu.c | 24 +-
drivers/iommu/amd_iommu.c | 11 +-
drivers/iommu/arm-smmu-v3.c | 856 ++++++++++++++++++++++++--------
drivers/iommu/arm-smmu.c | 103 +++-
drivers/iommu/dma-iommu.c | 9 +-
drivers/iommu/exynos-iommu.c | 3 +-
drivers/iommu/intel-iommu.c | 3 +-
drivers/iommu/io-pgtable-arm-v7s.c | 57 +--
drivers/iommu/io-pgtable-arm.c | 48 +-
drivers/iommu/iommu.c | 24 +-
drivers/iommu/ipmmu-vmsa.c | 28 +-
drivers/iommu/msm_iommu.c | 42 +-
drivers/iommu/mtk_iommu.c | 45 +-
drivers/iommu/mtk_iommu_v1.c | 3 +-
drivers/iommu/omap-iommu.c | 2 +-
drivers/iommu/qcom_iommu.c | 44 +-
drivers/iommu/rockchip-iommu.c | 2 +-
drivers/iommu/s390-iommu.c | 3 +-
drivers/iommu/tegra-gart.c | 12 +-
drivers/iommu/tegra-smmu.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 27 +-
include/linux/io-pgtable.h | 57 ++-
include/linux/iommu.h | 92 +++-
23 files changed, 1090 insertions(+), 407 deletions(-)
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu