On 15/08/2019 14:55, Will Deacon wrote:
On Thu, Aug 15, 2019 at 12:19:58PM +0100, John Garry wrote:
On 14/08/2019 18:56, Will Deacon wrote:
If you'd like to play with the patches, then I've also pushed them here:
but they should behave as a no-op on their own.
As anticipated, my storage testing scenarios roughly give parity throughput
and CPU loading before and after this series.
Patches to convert the
Arm SMMUv3 driver to the new API are here:
I quickly tested this again and now I see a performance lift:
before (5.3-rc1) after
D05 8x SAS disks 907K IOPS 970K IOPS
D05 1x NVMe 450K IOPS 466K IOPS
D06 1x NVMe 467K IOPS 466K IOPS
The CPU loading seems to track throughput, so nothing much to say there.
Note: From 5.2 testing, I was seeing >900K IOPS from that NVMe disk for
Cheers, John. For interest, how do things look if you pass iommu.strict=0?
That might give some indication about how much the invalidation is still
So I tested for iommu/cmdq for NVMe only, and I see:
!SMMU 5.3-rc4 strict/!strict cmdq strict/!strict
D05 NVMe 750K IOPS 456K/540K IOPS 466K/537K
D06 NVMe 750K IOPS 456K/740K IOPS 466K/745K
I don't know why the D06 iommu.strict performance is ~ same as D05,
while !strict is so much better. D06 SMMU implementation is supposed to
be generally much better than that of D05, so I would have thought that
the strict performance would be better (than that of D05).
BTW, what were your thoughts on changing
arm_smmu_atc_inv_domain()->arm_smmu_atc_inv_master() to batching? It seems
suitable, but looks untouched. Were you waiting for a resolution to the
performance issue which Leizhen reported?
In principle, I'm supportive of such a change, but I'm not currently able
to test any ATS stuff so somebody else would need to write the patch.
Jean-Philippe is on holiday at the moment, but I'd be happy to review
something from you if you send it out.
Unfortunately I don't have anything ATS-enabled either. Not many do, it
iommu mailing list