Hi, On Tue, May 2, 2023 at 8:23 AM Petr Mladek <pmla...@suse.com> wrote: > > On Mon 2023-05-01 08:24:46, Douglas Anderson wrote: > > From: Colin Cross <ccr...@android.com> > > > > Implement a hardlockup detector that doesn't doesn't need any extra > > arch-specific support code to detect lockups. Instead of using > > something arch-specific we will use the buddy system, where each CPU > > watches out for another one. Specifically, each CPU will use its > > softlockup hrtimer to check that the next CPU is processing hrtimer > > interrupts by verifying that a counter is increasing. > > > > --- /dev/null > > +++ b/kernel/watchdog_buddy_cpu.c > > +int watchdog_nmi_enable(unsigned int cpu) > > +{ > > + /* > > + * The new CPU will be marked online before the first hrtimer > > interrupt > > + * runs on it. > > It does not need to be the first hrtimer interrupt. The CPU might have > been offlined/onlined repeatedly. The counter might have any value. > > > + * If another CPU tests for a hardlockup on the new CPU > > + * before it has run its first hrtimer, it will get a false positive. > > + * Touch the watchdog on the new CPU to delay the first check for at > > + * least 3 sampling periods to guarantee one hrtimer has run on the > > new > > + * CPU. > > + */
OK, I've updated the above comment to: /* * The new CPU will be marked online before the hrtimer interrupt * gets a chance to run on it. If another CPU tests for a * hardlockup on the new CPU before it has run its the hrtimer * interrupt, it will get a false positive. Touch the watchdog on * the new CPU to delay the check for at least 3 sampling periods * to guarantee one hrtimer has run on the new CPU. */ > > + per_cpu(watchdog_touch, cpu) = true; > > We should touch also the next_cpu: > > /* > * We are going to check the next CPU. Our watchdog_hrtimer > * need not be zero if the CPU has already been online earlier. > * Touch the watchdog on the next CPU to avoid false positive > * if we try to check it in less then 3 interrupts. > */ > next_cpu = watchdog_next_cpu(cpu); > if (next_cpu < nr_cpu_ids) > per_cpu(watchdog_touch, next_cpu) = true; > > Alternative would be to clear watchdog_hrtimer. But it would kind-of > affect also the softlockup detector. Looks reasonable. I've incorporated it. _______________________________________________ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport