hello, for nearly 2 years i'm experiencing stability problems with DRAC5 management cards in all PE1950/PE2950 servers i take care of. machines run mostly debian linux but also vmware esxi and windows 2008. hangs occur randomly in the whole family of ~ 35 machines and have following symptoms: * hanged drac card responds to pings * hanged drac card listens on usual tcp ports [ eg 22, 443 ] but does not provide any answers on them - eg no ssh banner * hanged drac card no longer responds to IPMI over LAN requests * after few weeks of hang card no longer responds to pings
so far i had to take more often fully functional servers to reset the DRAC than use the drac to troubleshoot problematic machine. i was in contact with Dell's support numerous times. i was providing them dumps of network traffic, countless logs but best i got was early beta of 1.51 firmware which was released in December 2009. indeed this firmware made things better but still - around once per month i get a hang. what do i do with dracs? i ping them; i query them over ipmi over lan every ~ 1h; i reboot them twice a day - i started doing all this after they started to hang [ i had stability problems also before all this monitoring was introduced ]. i tried not to monitor dracs at all - results were same... after a while they got hanged. some of dracs use dedicated nic, other share LOM - both hang. what are your experiences with those cards - did you had similar issues? do you - by any chance - have any suggestions? just in case: i have all firmwares/bioses upgraded to most recent. thanks a lot! regards -- regards, Pawel Kudzia / .PaKud _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
