On 2006-01-20T12:37:10, Peter Kruse <[EMAIL PROTECTED]> wrote:

> 1. There is an ifmonitord that monitors all network interfaces in the 
> cluster
> and writes the current status to the cib so it is available to all nodes.
> When a network interface fails (link goes down)  before I return
> and error and cause a failover, I check if the other node has
> a link status of "up" for the specified interface.  This is obviously
> neccessary before it can take over.  If the status is not "up",
> no error is returned.  This is for clusters that are not fully
> redundant to minimize the risk of a false alarm.

OK, this we'll eventually provide again. (ipfail)

However, that's pretty close to how we eventually want to support this.
If you already have the ifmonitord written, it'd be a small step for you
to actually feed this into the CIB as dampened node attributes, right
(instead of doing it within the resource agent)? And then we could
handle this internally, and you claim to have contributed a major
feature to heartbeat 2.0.x! ;-)

> 2. You can set a resource in maintenance mode, that prevents
> the monitor action to return an error.  This variable is also
> stored in the cib, so tha RA have to check it every monitor
> interval.

Uhm, that is already supposed to exist within the CRM, if you set a
resource to unmanaged. We probably need an in-between state of "not
monitored" (or monitor failures ignored) instead of completely unmanaged
though.

Andrew?

> 3. you can set the maximum number of restarts before a real
> failover occurs, this is also stored in the cib.

This _definetely_ belongs into a generic feature within the CRM.
Handling it within the RA is not the right place. We have an AI for it,
ETA is 2.0.3 or 2.0.4 (Andrew?).

_If_ you're handling it within the RA, there's no point in storing it
within the CIB. That's a waste, because the CIB sync is pretty
expensive.

Set an instance parameter (which you'll then get within your environment
of course) and keep track of the number of local restarts within a file
under ${HA_RSCTMP} (that get's cleaned out on reboots).


Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business     -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to