On Sat, 2009-09-26 at 14:53 +0200, Nick Jennings wrote: > Heya Nick,
> Unfortunately that was the only info I could get. The client had no > information in the logs about what happened. They usually don't when they panic. > The MDS only had the > following entry near the time: > > Sep 25 22:28:43 dbn1 kernel: Lustre: MGS: haven't heard from client > ab5e5f08-e39d-385d-f7e3-fbd1addb0fac (at 10.0.0...@tcp1) in 248 seconds. > I think it's dead, and I am evicting it. That's because the client panic'd. > Is there any other info I should be gathering when something like this > happens? (Sorry, it's been a while since I've done any lustre bug > reporting) :) The only/most useful info you can get from a client panic is what was on the console when it panic'd. Does the "Hosting co." not have their machines hooked up to (i.e. serial) consoles? netconsole can usually be useful as a substitute for a physically connected serial console. Do you know if the client kernel has netconsole (usually bundled with netdump) available? Can you put a new kernel on the client that includes netdump? You could set up a netdump server somewhere (across the Internet even) then and see if you can get anything useful when it panics. Cheers, b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
