Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11505



(In reply to comment #0)
> From a conversation with hch (kernel developer) on IRC, the
> find_task_by_pid_type() (and therefore find_task_by_pid()) export is going
> away because there are no in-kernel users of this export.  We use it as part
> of our watchdog mechanism, to dump the stack of a slow process.

In recent lustre changes the export of this function is detected by configure
(HAVE_TASKLIST_LOCK) and the watchdog stack dumping is disabled.  It is easily
possible to fix the watchdog code to avoid the need to use
find_task_by_pid{_type}() because the watchdog setup already knows which task is
being watched (lcw_tsk).  However, unpatched kernels ALSO do not allow dumping
the stack of another process (show_task() is not exported by default and
dump_stack() only operates on the current process) so libcfs_debug_dumpstack()
would itself not produce any useful information.

If you could convice Christoph and Arjan that being able to dump the stack of
_another_ process is useful (i.e. allowing show_task() to be exported) that
would allow Lustre watchdogs to work on unpatched kernels.  Also, Arjan's
comment that the kernel already has a bunch of software watchdogs is likely true
but completely useless.  I suspect he refers to full-system software watchdogs
that emulate hardware watchdogs by using the NMI interrupt, and not per-process
watchdogs that Lustre uses to detect hung threads.

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to