On 2021/1/28 22:52, Chuck Lever wrote:
On Jan 28, 2021, at 8:59 AM, Robin Murphy <[email protected]> wrote:
On 2021-01-27 20:00, Chuck Lever wrote:
Hi-
This collection of patches seems to get the best throughtput results
so far. The NFS WRITE result is fully restored, and the NFS READ
result is very close to fully restored.
Children see throughput for 12 initial writers = 5008474.03 kB/sec
Parent sees throughput for 12 initial writers = 4996927.80 kB/sec
Min throughput per process = 416956.88 kB/sec
Max throughput per process = 417910.22 kB/sec
Avg throughput per process = 417372.84 kB/sec
Min xfer = 1046272.00 kB
CPU Utilization: Wall time 2.515 CPU time 1.996 CPU
utilization 79.37 %
Children see throughput for 12 rewriters = 5020584.59 kB/sec
Parent sees throughput for 12 rewriters = 5012539.29 kB/sec
Min throughput per process = 417799.00 kB/sec
Max throughput per process = 419082.22 kB/sec
Avg throughput per process = 418382.05 kB/sec
Min xfer = 1046528.00 kB
CPU utilization: Wall time 2.507 CPU time 2.024 CPU
utilization 80.73 %
Children see throughput for 12 readers = 5805484.25 kB/sec
Parent sees throughput for 12 readers = 5799535.68 kB/sec
Min throughput per process = 482888.16 kB/sec
Max throughput per process = 484444.16 kB/sec
Avg throughput per process = 483790.35 kB/sec
Min xfer = 1045760.00 kB
CPU utilization: Wall time 2.167 CPU time 1.964 CPU
utilization 90.63 %
Children see throughput for 12 re-readers = 5812227.16 kB/sec
Parent sees throughput for 12 re-readers = 5803793.06 kB/sec
Min throughput per process = 483242.97 kB/sec
Max throughput per process = 485724.41 kB/sec
Avg throughput per process = 484352.26 kB/sec
Min xfer = 1043456.00 kB
CPU utilization: Wall time 2.161 CPU time 1.976 CPU
utilization 91.45 %
I've included a simple-minded implementation of a map_sg op for
the Intel IOMMU. This is nothing more than a copy of the loop in
__iommu_map_sg() with the call to __iommu_map() replaced with a
call to intel_iommu_map().
...which is the main reason I continue to strongly dislike patches #4-#9 (#3 definitely
seems to makes sense either way, now that #1 and #2 are going to land). If a common
operation is worth optimising anywhere, then it deserves optimising everywhere, so we end
up with a dozen diverging copies of essentially the same code - particularly when the
driver-specific functionality *is* already in the drivers, so what gets duplicated is
solely the "generic" parts.
I don't disagree with that assessment, but I don't immediately see an
alternative API arrangement that would be more successful in the short
term. If 4/9 - 9/9 are not acceptable, then the responsible thing to
do would be to revert:
- 58a8bb39490d ("iommu/vt-d: Cleanup after converting to dma-iommu ops")
- c588072bba6b ("iommu/vt-d: Convert intel iommu driver to the iommu ops")
for v5.11, work out the proper API design, and then try the VT-d conversion
again.
Probably we could introduce an iommu_ops->map/unmap_range() callback and
let the iommu driver to select "one call per page" map() or simply map a
range in a loop.
Best regards,
baolu
IMHO.
And if there's justification for pushing iommu_map_sg() entirely into drivers, then it's
verging on self-contradictory not to do the same for iommu_map() and iommu_unmap(). Some
IOMMU drivers - mainly intel-iommu, as it happens - are already implementing hacks around
the "one call per page" interface being inherently inefficient, so the logical
thing to do here is take a step back and reconsider the fundamental design of the whole
map/unmap interface. Implementing hacks on top of hacks to make particular things faster
on particular systems that particular people care about is not going to do us any favours
in the long run.
As it stands, I can easily see a weird anti-pattern emerging where people start
adding code to fake up scatterlists in random drivers because they see
dma_map_sg() performing paradoxically better than dma_map_page().
Robin.
---
Chuck Lever (1):
iommu/vt-d: Introduce map_sg() for Intel IOMMUs
Isaac J. Manjarres (5):
iommu/io-pgtable: Introduce map_sg() as a page table op
iommu/io-pgtable-arm: Hook up map_sg()
iommu/io-pgtable-arm-v7s: Hook up map_sg()
iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
iommu/arm-smmu: Hook up map_sg()
Lu Baolu (1):
iommu/vt-d: Add iotlb_sync_map callback
Yong Wu (2):
iommu: Move iotlb_sync_map out from __iommu_map
iommu: Add iova and size as parameters in iotlb_sync_map
drivers/iommu/arm/arm-smmu/arm-smmu.c | 19 ++++
drivers/iommu/intel/iommu.c | 131 ++++++++++++++++++++------
drivers/iommu/io-pgtable-arm-v7s.c | 90 ++++++++++++++++++
drivers/iommu/io-pgtable-arm.c | 86 +++++++++++++++++
drivers/iommu/iommu.c | 47 +++++++--
drivers/iommu/tegra-gart.c | 7 +-
include/linux/iommu.h | 16 +++-
7 files changed, 353 insertions(+), 43 deletions(-)
--
Chuck Lever
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu
--
Chuck Lever
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu