On 7/2/19 10:45 PM, [email protected] wrote:

> However, we found that with the increase of that the TLB flash was called,
> the noise was also increasing. Here we understood that the cause of this 
> issue is the implementation of Linux's TLB flush for arm64, especially use of 
> TLBI-is instruction which is a broadcast to all processor core on the system. 

Are you saying that for a microbenchmark in which very large numbers of
threads are created and destroyed rapidly there are a large number of
associated tlb range flushes which always use broadcast TLBIs?

If that's the case, and the hardware doesn't do any ASID filtering and
each TLBI results in a DVM to every PE, would it make sense to look at
whether there are ways to improve batching/switch to an IPI approach
rather than relying on broadcasts, as a more generic solution?

Jon.

Reply via email to