Hello, there is a LBUG lnet upcall called right after LBUG() and before entering panic(). It is asynchronous but due to memory allocation inside kernel's call_usermodehelper() it may hang causing panic() not invoked. in result a sysadmin cannot get a crash dump for debugging another issue. The fact that lnet_upcall is pointing to non existing path by default doesn't change anything, the system may hang anyway.
I filed LU-8418 with a patch for that. The patch allows to skip any attempt to call lnet upcall and make getting a crash dump more reliable. The question is whether default libcfs_lnet_upcall() should be changed to not calling lnet_upcall. The patch contains such a change. The idea behind changing the default behavior was that lustre source doesn't contain any implementation of lnet upcall but the default value of "lnet_upcall" points to nowhere. I think very few Lustre installations uses own lbug lnet upcall script but other installs just uses the default settings and non-working lnet upcall with a potential risk of not calling panic() after LBUG() -- i.e. system does not reboot or does not produce a crash dump when expected. It is interesting does anybody really use lbug lnet upcall script nowdays? Thanks, -- Alexander Zarochentsev Seagate Technology, LLC www.seagate.com _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
