Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue

Will Deacon Wed, 24 Jul 2019 05:21:24 -0700

On Wed, Jul 24, 2019 at 10:58:26AM +0100, John Garry wrote:
> On 11/07/2019 18:19, Will Deacon wrote:
> > This is a significant rework of the RFC I previously posted here:
> > 
> >   https://lkml.kernel.org/r/[email protected]
> > 
> > But this time, it looks like it might actually be worthwhile according
> > to my perf profiles, where __iommu_unmap() falls a long way down the
> > profile for a multi-threaded netperf run. I'm still relying on others to
> > confirm this is useful, however.
> > 
> > Some of the changes since last time are:
> > 
> >   * Support for constructing and submitting a list of commands in the
> >     driver
> > 
> >   * Numerous changes to the IOMMU and io-pgtable APIs so that we can
> >     submit commands in batches
> > 
> >   * Removal of cmpxchg() from cmdq_shared_lock() fast-path
> > 
> >   * Code restructuring and cleanups
> > 
> > This current applies against my iommu/devel branch that Joerg has pulled
> > for 5.3. If you want to test it out, I've put everything here:
> > 
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq
> > 
> > Feedback welcome. I appreciate that we're in the merge window, but I
> > wanted to get this on the list for people to look at as an RFC.
> > 
> 
> I tested storage performance on this series, which I think is a better
> scenario to test than network performance, that being generally limited by
> the network link speed.


Interesting, thanks for sharing. Do you also see a similar drop in CPU time
to the one reported by Ganapat?

> Baseline performance (will/iommu/devel, commit 9e6ea59f3)
> 8x SAS disks D05      839K IOPS
> 1x NVMe D05           454K IOPS
> 1x NVMe D06           442k IOPS
> 
> Patchset performance (will/iommu/cmdq)
> 8x SAS disk D05               835K IOPS
> 1x NVMe D05           472K IOPS
> 1x NVMe D06           459k IOPS
> 
> So we see a bit of an NVMe boost, but about the same for 8x disks. No iommu
> performance is about 918K IOPs for 8x disks, so it is not limited by the
> medium.

It would be nice to know if this performance gap is because of Linux, or
simply because of the translation overhead in the SMMU hardware. Are you
able to get a perf profile to see where we're spending time?

Thanks,

Will
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue

Reply via email to