On Monday 01 August 2011 13:58:55 Trevor Hemsley wrote: > Today they did it again. And then several more times - about every 20 > minutes in fact. The servers are in a remote data centre and I have no > console access and the iLO's on these two servers are not set up and I'm > unable to use them so I can see no output on the console. There's no > information in /var/log about what the problem is, all I see is that one > of the servers reboots itself and then 5 to 10 seconds later, the 2nd > one follows it. I've seen from the logs that it's not always the same > one that reboots first, sometimes it's one and sometimes the other. The > only way I've managed to get the servers out of their 20 minute reboot > loop is to stop drbd on one of the pair and migrate all my VMs to run on > the other with all the DRBD devices in standalone mode. This seems to me > to indicate that DRBD is most probably involved in the reboot.
Just a shot in the dark (because I was hit by the same last friday): Is there a watchdog active and set to a timeout of 20 minutes? Could be the corresponding userspace tool was removed or rendered unusable during the update... Good luck, Arnold
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
