Hi! > Below is a patch from android kernel that detects a driver suspend > lockup and captures dump in the kernel log. Please review and provide > comments. > > Rather than hard-lock the kernel, dump the suspend thread stack and > BUG() when a driver takes too long to suspend. The timeout is set to > 12 seconds to be longer than the usbhid 10 second timeout. > > Exclude from the watchdog the time spent waiting for children that > are resumed asynchronously and time every device, whether or not they > resumed synchronously. > > Cc: Android Kernel Team <kernel-t...@android.com> > Cc: Colin Cross <ccr...@android.com> > Cc: Todd Poynor <toddpoy...@google.com> > Cc: San Mehat <s...@google.com> > Cc: Benoit Goby <ben...@android.com> > Cc: John Stultz <john.stu...@linaro.org> > Cc: Pavel Machek <pa...@ucw.cz> > Cc: Rafael J. Wysocki <r...@sisk.pl> > Cc: Len Brown <len.br...@intel.com> > Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org> > Original-author: San Mehat <s...@google.com> > Signed-off-by: Benoit Goby <ben...@android.com> > [zoran.marko...@linaro.org: Changed printk(KERN_EMERG,...) to pr_emerg(...), > tweaked commit message.] > Signed-off-by: Zoran Markovic <zoran.marko...@linaro.org> > --- > drivers/base/power/main.c | 45 > +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 45 insertions(+) > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index 15beb50..eb70c0e 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -29,6 +29,8 @@ > #include <linux/async.h> > #include <linux/suspend.h> > #include <linux/cpuidle.h> > +#include <linux/timer.h> > + > #include "../base.h" > #include "power.h" > > @@ -54,6 +56,12 @@ struct suspend_stats suspend_stats; > static DEFINE_MUTEX(dpm_list_mtx); > static pm_message_t pm_transition; > > +static void dpm_drv_timeout(unsigned long data); > +struct dpm_drv_wd_data { > + struct device *dev; > + struct task_struct *tsk; > +}; > + > static int async_error; > > /** > @@ -663,6 +671,30 @@ static bool is_async(struct device *dev) > } > > /** > + * dpm_drv_timeout - Driver suspend / resume watchdog handler > + * @data: struct device which timed out > + * > + * Called when a driver has timed out suspending or resuming. > + * There's not much we can do here to recover so > + * BUG() out for a crash-dump > + * > + */ > +static void dpm_drv_timeout(unsigned long data) > +{ > + struct dpm_drv_wd_data *wd_data = (void *)data; > + struct device *dev = wd_data->dev; > + struct task_struct *tsk = wd_data->tsk; > + > + pr_emerg("**** DPM device timeout: %s (%s)\n", dev_name(dev), > + (dev->driver ? dev->driver->name : "no driver")); > + > + pr_emerg("dpm suspend stack:\n"); > + show_stack(tsk, NULL); > + > + BUG(); > +}
So you: dump stack of the suspend task do BUG which dumps stack of current task kills current task Current task may very well be idle task; in such case you kill the machine. Sounds like you should be doing something else, like kill -9 instead of BUG()? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/