On Wed, 2009-03-04 at 12:11 -0500, Ms. Megan Larko wrote: > Greetings, > > I have a Lustre OSS with eleven (0-11) OSTs. Every once in a while > the OSS hosting the OSTs fails with a kernel panic.
To be clear, what you are reporting is not a kernel panic. It is a watchdog timeout. Kernel panics halt the machine. Watchdog timeouts do not, although both will print stack traces so they are easily confused. > The system runs > CentOS 5.1 using Lustre kernel 2.6.18-53.1.13.el5_lustre.1.6.4.3smp. I would suggest upgrading to 1.6.7. We fix quite a number of bug with each point release we do and given that you 3 behind, that is a lot of bugs. > How do I understand the "lock callback timer expired message below? A client was requested to give back a lock it held and timed out doing so. Usually indicates a bug or a network failure. > After the dump the system shows "kernel panic" on console and requires > a manual reboot. Maybe you are getting a panic, but there is no evidence of that in what you pasted below, just the watchdog timeout and its stack trace. > Any tips and insight greatly appreciated. Really. If at all possible, upgrade. b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
