On 21/09/2020 14:58, John Garry wrote:
On 21/09/2020 14:43, Will Deacon wrote:
On Fri, Aug 21, 2020 at 09:54:20PM +0800, John Garry wrote:
As mentioned in [0], the CPU may consume many cycles processing
arm_smmu_cmdq_issue_cmdlist(). One issue we find is the cmpxchg() loop to
get space on the queue takes a lot of time once we start getting many
CPUs contending - from experiment, for 64 CPUs contending the cmdq,
success rate is ~ 1 in 12, which is poor, but not totally awful.

This series removes that cmpxchg() and replaces with an atomic_add,
same as how the actual cmdq deals with maintaining the prod pointer.
 > I'm still not a fan of this.

:(

Could you try to adapt the hacks I sent before,
please? I know they weren't quite right (I have no hardware to test on), but
the basic idea is to fall back to a spinlock if the cmpxchg() fails. The
queueing in the spinlock implementation should avoid the contention.

OK, so if you're asking me to try this again, then I can do that, and see what it gives us.


JFYI, to prove that this is not a problem which affects only our HW, I managed to test an arm64 platform from another vendor. Generally I see the same issue, and this patchset actually helps that platform even more.

                CPUs    Before  After   % Increase
Huawei D06      8       282K    302K    7%
Other                   379K    420K    11%

Huawei D06      16      115K    193K    68K
Other                   102K    291K    185K

Huawei D06      32      36K     80K     122%
Other                   41K     156K    280%

Huawei D06      64      11K     30K     172%
Other                   6K      47K     683%

I tested with something like [1], so unit is map+unmaps per cpu per second - higher is better.

My D06 is memory poor, so would expect higher results otherwise (with more memory). Indeed, my D05 has memory on all nodes and performs better.

Anyway, I see that the implementation here is not perfect, and I could not get suggested approach to improve performance significantly. So back to the drawing board...

Thanks,
John

[1] https://lore.kernel.org/linux-iommu/[email protected]/

Reply via email to