Re: [PATCH] sched/numa: use down_read_trylock for mmap_sem

Peter Zijlstra Tue, 16 May 2017 01:17:28 -0700

On Mon, May 15, 2017 at 03:13:16PM +0200, Vlastimil Babka wrote:
> A customer has reported a soft-lockup when running a proprietary intensive
> memory stress test, where the trace on multiple CPU's looks like this:
> 
>  RIP: 0010:[<ffffffff810c53fe>]
>   [<ffffffff810c53fe>] native_queued_spin_lock_slowpath+0x10e/0x190
> ...
>  Call Trace:
>   [<ffffffff81182d07>] queued_spin_lock_slowpath+0x7/0xa
>   [<ffffffff811bc331>] change_protection_range+0x3b1/0x930
>   [<ffffffff811d4be8>] change_prot_numa+0x18/0x30
>   [<ffffffff810adefe>] task_numa_work+0x1fe/0x310
>   [<ffffffff81098322>] task_work_run+0x72/0x90
> 
> Further investigation showed that the lock contention here is pmd_lock().
> 
> The task_numa_work() function makes sure that only one thread is let to 
> perform
> the work in a single scan period (via cmpxchg), but if there's a thread with
> mmap_sem locked for writing for several periods, multiple threads in
> task_numa_work() can build up a convoy waiting for mmap_sem for read and then
> all get unblocked at once.
> 
> This patch changes the down_read() to the trylock version, which prevents the
> build up. For a workload experiencing mmap_sem contention, it's probably 
> better
> to postpone the NUMA balancing work anyway. This seems to have fixed the soft
> lockups involving pmd_lock(), which is in line with the convoy theory.
> 
> Signed-off-by: Vlastimil Babka <[email protected]>


Thanks!

Re: [PATCH] sched/numa: use down_read_trylock for mmap_sem

Reply via email to