Commit-ID:  42f930da7f00c0ab23df4c7aed36137f35988980
Gitweb:     https://git.kernel.org/tip/42f930da7f00c0ab23df4c7aed36137f35988980
Author:     Don Zickus <[email protected]>
AuthorDate: Wed, 1 Nov 2017 14:11:27 -0400
Committer:  Thomas Gleixner <[email protected]>
CommitDate: Wed, 1 Nov 2017 21:18:40 +0100

watchdog/hardlockup/perf: Use atomics to track in-use cpu counter

Guenter reported:
  There is still a problem. When running 
    echo 6 > /proc/sys/kernel/watchdog_thresh
    echo 5 > /proc/sys/kernel/watchdog_thresh
  repeatedly, the message
 
   NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
 
  stops after a while (after ~10-30 iterations, with fluctuations).
  Maybe watchdog_cpus needs to be atomic ?

That's correct as this again is affected by the asynchronous nature of the
smpboot thread unpark mechanism.

CPU 0                           CPU1                    CPU2
write(watchdog_thresh, 6)       
  stop()
    park()
  update()
  start()
    unpark()
                                thread->unpark()
                                  cnt++;
write(watchdog_thresh, 5)                               thread->unpark()
  stop()
    park()                      thread->park()
                                   cnt--;                 cnt++;
  update()
  start()
    unpark()

That's not a functional problem, it just affects the informational message.

Convert watchdog_cpus to atomic_t to prevent the problem

Reported-and-tested-by: Guenter Roeck <[email protected]>
Signed-off-by: Don Zickus <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]


---
 kernel/watchdog_hld.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index a7f137c..a84b205 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -12,6 +12,7 @@
 #define pr_fmt(fmt) "NMI watchdog: " fmt
 
 #include <linux/nmi.h>
+#include <linux/atomic.h>
 #include <linux/module.h>
 #include <linux/sched/debug.h>
 
@@ -25,7 +26,7 @@ static DEFINE_PER_CPU(struct perf_event *, dead_event);
 static struct cpumask dead_events_mask;
 
 static unsigned long hardlockup_allcpu_dumped;
-static unsigned int watchdog_cpus;
+static atomic_t watchdog_cpus = ATOMIC_INIT(0);
 
 void arch_touch_nmi_watchdog(void)
 {
@@ -189,7 +190,8 @@ void hardlockup_detector_perf_enable(void)
        if (hardlockup_detector_event_create())
                return;
 
-       if (!watchdog_cpus++)
+       /* use original value for check */
+       if (!atomic_fetch_inc(&watchdog_cpus))
                pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
 
        perf_event_enable(this_cpu_read(watchdog_ev));
@@ -207,7 +209,7 @@ void hardlockup_detector_perf_disable(void)
                this_cpu_write(watchdog_ev, NULL);
                this_cpu_write(dead_event, event);
                cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
-               watchdog_cpus--;
+               atomic_dec(&watchdog_cpus);
        }
 }
 

Reply via email to