Looks like it's time for the "Dilger Procedure", yes? See http://wiki.hpc.ufl.edu/index.php/Lustre
At least, it sounds like the same thing we encountered and this worked for us. Charlie Taylor UF HPC Center On Sep 15, 2008, at 10:52 AM, Dan wrote: > Hi, > > One of my OSSs crashed last week. All OSTs on it recover and mount > except one that causes a kernel panic when it starts replaying > (after waiting for clients to connect). I fscked it this weekend > and found no errors but still it panics the system. > > My only idea on how to fix it was to run lctl --device 12 > abort_recovery. The instant you run this it causes a kernel panic. > Somethings not right about the replay info I guess. I've brought up > the other OSTs and deactivated them on the MGS/MDT but I cannot get > the clients to mount anyway. Do I need to deactivate it on the > clients too? > > Help! > > Dan > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
