membarrier_global_expedited reads each runqueue current task's
mm->membarrier_state to test a flag. A concurrent task exit can cause
the memory backing the struct mm to have been freed before that read.
Therefore, use probe_kernel_address to read that flag. If the read
fails, consider it as if the flag was unset: it means the scheduler code
provides the barriers required by membarrier, and sending an IPI to this
CPU is redundant.

There is ongoing discussion on removing the need to use
probe_kernel_address from this code by providing additional existence
guarantees for struct mm. This patch is submitted as a fix aiming to be
backported to prior stable kernel releases.

Signed-off-by: Mathieu Desnoyers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Russell King - ARM Linux admin <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
 kernel/sched/membarrier.c | 37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 02feb7c8da4f..0312604d8315 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -58,6 +58,8 @@ static int membarrier_global_expedited(void)
        cpus_read_lock();
        for_each_online_cpu(cpu) {
                struct task_struct *p;
+               struct mm_struct *mm;
+               int membarrier_state;
 
                /*
                 * Skipping the current CPU is OK even through we can be
@@ -72,17 +74,34 @@ static int membarrier_global_expedited(void)
 
                rcu_read_lock();
                p = task_rcu_dereference(&cpu_rq(cpu)->curr);
-               if (p) {
-                       struct mm_struct *mm = READ_ONCE(p->mm);
+               if (!p)
+                       goto next;
+               mm = READ_ONCE(p->mm);
+               if (!mm)
+                       goto next;
 
-                       if (mm && (atomic_read(&mm->membarrier_state) &
-                                  MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
-                               if (!fallback)
-                                       __cpumask_set_cpu(cpu, tmpmask);
-                               else
-                                       smp_call_function_single(cpu, ipi_mb, 
NULL, 1);
-                       }
+               /*
+                * Using probe_kernel_address() of membarrier_state instead of
+                * an atomic_read() to deal with the fact that mm may have been
+                * concurrently freed. If probe_kernel_address fails, it means
+                * the mm struct was freed, so there is no need to issue a
+                * barrier on that particular CPU, because the scheduler code
+                * is taking care of it.
+                *
+                * It does not matter whether probe_kernel_address decides to
+                * read membarrier_state piece-wise, given that we only care
+                * about testing a single bit.
+                */
+               if (probe_kernel_address(&mm->membarrier_state.counter,
+                                        membarrier_state))
+                       membarrier_state = 0;
+               if (membarrier_state & MEMBARRIER_STATE_GLOBAL_EXPEDITED) {
+                       if (!fallback)
+                               __cpumask_set_cpu(cpu, tmpmask);
+                       else
+                               smp_call_function_single(cpu, ipi_mb, NULL, 1);
                }
+       next:
                rcu_read_unlock();
        }
        if (!fallback) {
-- 
2.17.1

Reply via email to