Hello,

I am working on a feature to add system health metrics to HA.  With this
information, HA could failover nodes away from hardware that might have
problems.

The following is a short description of what we want this new feature to
do.

Feature Name:     Health monitoring support
Purpose:    Allow pacemaker to schedule resources in a way that's sensitive
to a variety of server-related health metrics

Description:
Add support in pacemaker for a class of attributes which would be specially
treated.  Under this proposal, all attributes defined for a node whose name
matches the regular expression /^#health-.*$/ would be automatically added
into the score for each resource being considered for scheduling on that
node.

The purpose of this is to allow multiple independent health monitors to
each set their own health status and have that taken into account when
scheduling resources.  For example, IBM might define one called
#health-ibmserver.  Someone using smarttools (disk health monitors) might
define one called #health-smarttools.  Someone else using IPMI might define
one called #health-ipmi.   This means that this feature is not specific to
any vendor, and various health monitor providers can develop health metrics
for their hardware and not have to coordinate with each other in their
development process.

Typical usage of these variables is expected to be something like this:

      Health      Attribute-value   Meaning
      green 1000        server is happy, capable of running any resource
      yellow      0           server is marginal - it is desirable to
schedule resources somewhere else if you can
      red   -INFINITY   server is unreliable (but still up) and should not
be used

Note that the value given for green is likely to be configuration-specific,
and should be configurable by the various health monitoring tools as they
get developed.

Special Note:
IBM is already in the process of developing such a health monitoring tool
for IBM X (intel-class) servers.

So, what do you all think of this proposed functionality?  Does it sound
reasonable?  Comments are appreciated.

Mark_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to