Thanks for the info, I will be sure to have our monitor watch the iLO ports...
I've done some testing with fence_ilo and haven't seen a lengthy failover time. I'm running the Python script that is part of the Clustering group. Is that the user contributed one? My testing right now has failover done in a few seconds. On Mon, Apr 13, 2009 at 2:08 PM, Robert Hurst <[email protected]>wrote: > You're right about there is no such thing as fail-safe ... but I would > worry more if I just hard-code a return value of SUCCESS in my scripts. > Management cards are supposed to work, even if they are powered down -- not > that there is a loss of power to both lines. If that is the case, no > electricity == no servers == no cluster, which means you are doing a cold > boot regardless. > > We have both fence_ilo and fence_bladecenter in effect. As good as the iLO > cards have performed to date, we are still moving off HP DL385s into IBM > BladeCenter because its management processors are closer to fault tolerant > than anything else we have experienced. I have had HP iLO cards "crash" and > not reset themselves -- although later firmware revisions have reduced those > outages greatly. Monitoring its https and ssh ports for availability are a > requirement! > > There is user-contributed fence_ilo patch listed somewhere in this list > worth investigating -- it runs A LOT FASTER than the stock one. AFAIK, the > fence_ilo does not use ssh, but a sort of web soap services call via https. > We have seen in production and testing that a typical fencing operation > using fence_ilo is 42-seconds, and a good percentage of time, up to twice as > long as that. The bladecenter fencing operations we have seen occur in > under 7-seconds. > >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
