Hello, There haven't been too many workqueue stall bugs; however, good part of them have been pretty painful to track down because there's no lockup detection mechanism for workqueue and it isn't easy to tell what's going on with workqueues; furthermore, some requirements are tricky to get right - e.g. it's not too difficult to miss WQ_MEM_RECLAIM for a workqueue which runs a work item which is flushed by something which sits in the reclaim path.
To alleviate the situation, this two patch series implements workqueue lockup detector. Each worker_pool tracks the last time it made forward progress and if no forward progress is made for longer than threshold it triggers warnings and dumps workqueue state. It's controlled together with scheduler softlockup mechanism and uses the same threshold value as it shares a lot of the characteristics. Thanks. ------ 8< ------ touch_softlockup_watchdog() is used to tell watchdog that scheduler stall is expected. One group of usage is from paths where the task may not be able to yield for a long time such as performing slow PIO to finicky device and coming out of suspend. The other is to account for scheduler and timer going idle. For scheduler softlockup detection, there's no reason to distinguish the two cases; however, workqueue lockup detector is planned and it can use the same signals from the former group while the latter would spuriously prevent detection. This patch introduces a new function touch_softlockup_watchdog_sched() and convert the latter group to call it instead. For now, it just calls touch_softlockup_watchdog() and there's no functional difference. Signed-off-by: Tejun Heo <[email protected]> Cc: Ulrich Obergfell <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> --- include/linux/sched.h | 4 ++++ kernel/sched/clock.c | 2 +- kernel/time/tick-sched.c | 6 +++--- kernel/watchdog.c | 15 ++++++++++++++- 4 files changed, 22 insertions(+), 5 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -377,6 +377,7 @@ extern void scheduler_tick(void); extern void sched_show_task(struct task_struct *p); #ifdef CONFIG_LOCKUP_DETECTOR +extern void touch_softlockup_watchdog_sched(void); extern void touch_softlockup_watchdog(void); extern void touch_softlockup_watchdog_sync(void); extern void touch_all_softlockup_watchdogs(void); @@ -387,6 +388,9 @@ extern unsigned int softlockup_panic; extern unsigned int hardlockup_panic; void lockup_detector_init(void); #else +static inline void touch_softlockup_watchdog_sched(void) +{ +} static inline void touch_softlockup_watchdog(void) { } --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -354,7 +354,7 @@ void sched_clock_idle_wakeup_event(u64 d return; sched_clock_tick(); - touch_softlockup_watchdog(); + touch_softlockup_watchdog_sched(); } EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event); --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -143,7 +143,7 @@ static void tick_sched_handle(struct tic * when we go busy again does not account too much ticks. */ if (ts->tick_stopped) { - touch_softlockup_watchdog(); + touch_softlockup_watchdog_sched(); if (is_idle_task(current)) ts->idle_jiffies++; } @@ -430,7 +430,7 @@ static void tick_nohz_update_jiffies(kti tick_do_update_jiffies64(now); local_irq_restore(flags); - touch_softlockup_watchdog(); + touch_softlockup_watchdog_sched(); } /* @@ -701,7 +701,7 @@ static void tick_nohz_restart_sched_tick update_cpu_load_nohz(); calc_load_exit_idle(); - touch_softlockup_watchdog(); + touch_softlockup_watchdog_sched(); /* * Cancel the scheduled timer and restore the tick */ --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -225,7 +225,15 @@ static void __touch_watchdog(void) __this_cpu_write(watchdog_touch_ts, get_timestamp()); } -void touch_softlockup_watchdog(void) +/** + * touch_softlockup_watchdog_sched - touch watchdog on scheduler stalls + * + * Call when the scheduler may have stalled for legitimate reasons + * preventing the watchdog task from executing - e.g. the scheduler + * entering idle state. This should only be used for scheduler events. + * Use touch_softlockup_watchdog() for everything else. + */ +void touch_softlockup_watchdog_sched(void) { /* * Preemption can be enabled. It doesn't matter which CPU's timestamp @@ -233,6 +241,11 @@ void touch_softlockup_watchdog(void) */ raw_cpu_write(watchdog_touch_ts, 0); } + +void touch_softlockup_watchdog(void) +{ + touch_softlockup_watchdog_sched(); +} EXPORT_SYMBOL(touch_softlockup_watchdog); void touch_all_softlockup_watchdogs(void) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

