Thanks for all comments. I'll try netdump and see how it goes. It'll take a while but I'll be back.
Anyways, could someone answer my second and third questions? - Is RECOVERING enough? Should we run e2fsck + lfsck every time Lustre failed? - Quota is turned off when *any* OSS node failed. Are there anyways to have it "always on"? BTW, when I turn quota back on, sometimes quota setting goes wrong, some OSS has only 1 byte while the others have proper value. We work aroudn this by reset the quota back to all-zero and set the quota again. Is this normal? Johann Lombardi wrote: > On Mon, Nov 26, 2007 at 08:49:56PM +0700, Somsak Sriprayoonsakul wrote: > >> Could you tell me how to dump the whole crash log to file? It's not >> appear in /var/log/messages. I only seen it once actually. That's why I >> don't know the function name :) But the whole screen are something >> related to lustre for sure. >> > > You should set up serial consoles (or netconsole). A crash dump utility > (netdump, LKCD, ...) is also very useful. > > >> Note that, the dump log is longer than a screen size, so taking photo >> wouldn't help ( I think ). >> > > If /proc/sys/kernel/panic_on_oops is set to 1 on the OSS, you could try to set > it to 0 and to log onto the node to get the stack trace via dmesg before > rebooting it. > > Johann > > -- ----------------------------------------------------------------------------------- Somsak Sriprayoonsakul Thai National Grid Center Software Industry Promotion Agency Ministry of ICT, Thailand [EMAIL PROTECTED] ----------------------------------------------------------------------------------- _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
