Re: [Linux-cluster] Fenced failing continuously

Ian Hayes Mon, 13 Apr 2009 13:27:08 -0700

Thanks for the info, I will be sure to have our monitor watch the iLO
ports...


I've done some testing with fence_ilo and haven't seen a lengthy failover
time. I'm running the Python script that is part of the Clustering group. Is
that the user contributed one? My testing right now has failover done in a
few seconds.


On Mon, Apr 13, 2009 at 2:08 PM, Robert Hurst <[email protected]>wrote:

>  You're right about there is no such thing as fail-safe ... but I would
> worry more if I just hard-code a return value of SUCCESS in my scripts.
> Management cards are supposed to work, even if they are powered down -- not
> that there is a loss of power to both lines.  If that is the case, no
> electricity == no servers == no cluster, which means you are doing a cold
> boot regardless.
>
> We have both fence_ilo and fence_bladecenter in effect.  As good as the iLO
> cards have performed to date, we are still moving off HP DL385s into IBM
> BladeCenter because its management processors are closer to fault tolerant
> than anything else we have experienced.  I have had HP iLO cards "crash" and
> not reset themselves -- although later firmware revisions have reduced those
> outages greatly.  Monitoring its https and ssh ports for availability are a
> requirement!
>
> There is user-contributed fence_ilo patch listed somewhere in this list
> worth investigating -- it runs A LOT FASTER than the stock one.  AFAIK, the
> fence_ilo does not use ssh, but a sort of web soap services call via https.
> We have seen in production and testing that a typical fencing operation
> using fence_ilo is 42-seconds, and a good percentage of time, up to twice as
> long as that.  The bladecenter fencing operations we have seen occur in
> under 7-seconds.
>
>

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Fenced failing continuously

Reply via email to