Re: [PATCH RFC 0/9] Possible set of VT-d optimizations

Robin Murphy Thu, 28 Jan 2021 05:59:51 -0800

On 2021-01-27 20:00, Chuck Lever wrote:

Hi-


This collection of patches seems to get the best throughtput results
so far. The NFS WRITE result is fully restored, and the NFS READ
result is very close to fully restored.

        Children see throughput for 12 initial writers  = 5008474.03 kB/sec
        Parent sees throughput for 12 initial writers   = 4996927.80 kB/sec
        Min throughput per process                      = 416956.88 kB/sec
        Max throughput per process                      = 417910.22 kB/sec
        Avg throughput per process                      = 417372.84 kB/sec
        Min xfer                                        = 1046272.00 kB
        CPU Utilization: Wall time    2.515    CPU time    1.996    CPU 
utilization  79.37 %


        Children see throughput for 12 rewriters        = 5020584.59 kB/sec
        Parent sees throughput for 12 rewriters         = 5012539.29 kB/sec
        Min throughput per process                      = 417799.00 kB/sec
        Max throughput per process                      = 419082.22 kB/sec
        Avg throughput per process                      = 418382.05 kB/sec
        Min xfer                                        = 1046528.00 kB
        CPU utilization: Wall time    2.507    CPU time    2.024    CPU 
utilization  80.73 %


        Children see throughput for 12 readers          = 5805484.25 kB/sec
        Parent sees throughput for 12 readers           = 5799535.68 kB/sec
        Min throughput per process                      = 482888.16 kB/sec
        Max throughput per process                      = 484444.16 kB/sec
        Avg throughput per process                      = 483790.35 kB/sec
        Min xfer                                        = 1045760.00 kB
        CPU utilization: Wall time    2.167    CPU time    1.964    CPU 
utilization  90.63 %


        Children see throughput for 12 re-readers       = 5812227.16 kB/sec
        Parent sees throughput for 12 re-readers        = 5803793.06 kB/sec
        Min throughput per process                      = 483242.97 kB/sec
        Max throughput per process                      = 485724.41 kB/sec
        Avg throughput per process                      = 484352.26 kB/sec
        Min xfer                                        = 1043456.00 kB
        CPU utilization: Wall time    2.161    CPU time    1.976    CPU 
utilization  91.45 %

I've included a simple-minded implementation of a map_sg op for
the Intel IOMMU. This is nothing more than a copy of the loop in
__iommu_map_sg() with the call to __iommu_map() replaced with a
call to intel_iommu_map().

...which is the main reason I continue to strongly dislike patches #4-#9(#3 definitely seems to makes sense either way, now that #1 and #2 aregoing to land). If a common operation is worth optimising anywhere, thenit deserves optimising everywhere, so we end up with a dozen divergingcopies of essentially the same code - particularly when thedriver-specific functionality *is* already in the drivers, so what getsduplicated is solely the "generic" parts.

And if there's justification for pushing iommu_map_sg() entirely intodrivers, then it's verging on self-contradictory not to do the same foriommu_map() and iommu_unmap(). Some IOMMU drivers - mainly intel-iommu,as it happens - are already implementing hacks around the "one call perpage" interface being inherently inefficient, so the logical thing to dohere is take a step back and reconsider the fundamental design of thewhole map/unmap interface. Implementing hacks on top of hacks to makeparticular things faster on particular systems that particular peoplecare about is not going to do us any favours in the long run.

As it stands, I can easily see a weird anti-pattern emerging wherepeople start adding code to fake up scatterlists in random driversbecause they see dma_map_sg() performing paradoxically better thandma_map_page().


Robin.

---

Chuck Lever (1):
       iommu/vt-d: Introduce map_sg() for Intel IOMMUs

Isaac J. Manjarres (5):
       iommu/io-pgtable: Introduce map_sg() as a page table op
       iommu/io-pgtable-arm: Hook up map_sg()
       iommu/io-pgtable-arm-v7s: Hook up map_sg()
       iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
       iommu/arm-smmu: Hook up map_sg()

Lu Baolu (1):
       iommu/vt-d: Add iotlb_sync_map callback

Yong Wu (2):
       iommu: Move iotlb_sync_map out from __iommu_map
       iommu: Add iova and size as parameters in iotlb_sync_map


  drivers/iommu/arm/arm-smmu/arm-smmu.c |  19 ++++
  drivers/iommu/intel/iommu.c           | 131 ++++++++++++++++++++------
  drivers/iommu/io-pgtable-arm-v7s.c    |  90 ++++++++++++++++++
  drivers/iommu/io-pgtable-arm.c        |  86 +++++++++++++++++
  drivers/iommu/iommu.c                 |  47 +++++++--
  drivers/iommu/tegra-gart.c            |   7 +-
  include/linux/iommu.h                 |  16 +++-
  7 files changed, 353 insertions(+), 43 deletions(-)

--
Chuck Lever

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH RFC 0/9] Possible set of VT-d optimizations

Reply via email to