Hello Nicholas, On Tue, Apr 2, 2019 at 4:57 PM Nicholas Piggin <npig...@gmail.com> wrote: > > Using a jiffies timer creates a dependency on the tick_do_timer_cpu > incrementing jiffies. If that CPU has locked up and jiffies is not > incrementing, the watchdog heartbeat timer for all CPUs stops and > creates false positives and confusing warnings on local CPUs, and > also causes the SMP detector to stop, so the root cause is never > detected. > > Fix this by using hrtimer based timers for the watchdog heartbeat, > like the generic kernel hardlockup detector. > > Reported-by: Ravikumar Bangoria <ravi.bango...@in.ibm.com> > Signed-off-by: Nicholas Piggin <npig...@gmail.com>
[..snip..] > @@ -325,19 +325,21 @@ EXPORT_SYMBOL(arch_touch_nmi_watchdog); > > static void start_watchdog_timer_on(unsigned int cpu) > { > - struct timer_list *t = per_cpu_ptr(&wd_timer, cpu); > + struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer); This function can be called during the initialization via watchdog_nmi_start --> for_each_online_cpu(cpu) start_wd_on_cpu(cpu) --> start_watchdog_timer_on(cpu) Thus, it is not guarateed that we are always calling start_watchdog_timer_on() from the CPU where we want to start the watchdog timer. Thus, should we be calling this function from start_wd_on_cpu() via an smp_call_function_single() ? > > per_cpu(wd_timer_tb, cpu) = get_tb(); > > - timer_setup(t, wd_timer_fn, TIMER_PINNED); > - wd_timer_reset(cpu, t); > + hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); > + hrtimer->function = watchdog_timer_fn; > + hrtimer_start(hrtimer, ms_to_ktime(wd_timer_period_ms), > + HRTIMER_MODE_REL_PINNED); > } > > static void stop_watchdog_timer_on(unsigned int cpu) > { > - struct timer_list *t = per_cpu_ptr(&wd_timer, cpu); > + struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer); > > - del_timer_sync(t); > + hrtimer_cancel(hrtimer); > } > > static int start_wd_on_cpu(unsigned int cpu) > -- > 2.20.1 > -- Thanks and Regards gautham.