On Saturday 10 October 2009 08:09:45 Mark Hahn wrote: > > We have more then 400 machines. Every month there is one machine that we > > can not reboot using IPMI or the SOL is not working. > > we have something like 2500 nodes, mostly HP dl145g2's, and have a > BMC-wedge probably 6-12 times/year. can I ask what brand/model has such > flakey IPMI? if you run "ipmi mc reset" on the node, does it resolve the > problem? I wonder whether flakiness might also correspond to some config or > usage pattern. (ours dhcp from a local server - actually all the traffic > is local.)
These are only Dell machines used for shared hosting. Usually these problem appear when there is DoS/DDoS or very high system resource usage(for example load over 100 on machine with 4 cores). Our problem is that in such situations IPMI sometimes is unreliable as you can not connect on serial nor reboot the machine. -- Best regards, Marian Marinov _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
