Hello, all,

I'm new to the heartbeat project and I'm experiencing some problems
in setting things up. I will be very grateful for any help.

I want to use the heartbeat to detect a disconnection of a node in a computational
cluster. As well, I need to detect when the node is back on the
network. On both events a custom script should be started either on the head node
or both on the head node and on the computational node.

I've managed to setup the heartbeat on two nodes (A and B) and I can observe the nodes status with the help of crm_mon. When I simulate the lost of connectivity on the node B (using ifdown) after some time I can see the
changing in the status of this node: on the node A
crm_mon reports that B is "OFFLINE" and A - "online", and on the node B crm_mon reports that the A is "OFFLINE" and B - "online". In this case I simply cannot determine from crm_mon data (or cib.xml) which node is actually down and
where should I shutdown my processes.

Moreover when I turn on the network interface on the node B back the status of the node doesn't change at all. Only when I restart the heartbeat service either on the node B or on the node A, the status of the node B returns to
"online". That's pretty odd behavior, I think.

My questions are the following:
1) Is the heartbeat an appropriate solution for my task or should I use something else? 2) If the heartbeat is fine, then what am I doing wrong? Why the status of the node B doesn't return to "online"
state when turn on the network interface?
3) Is there any API to check the status of the node? Parsing the cib.xml is not very convenient.

Here's my configuration:
CentOS, kernel 2.6.18, x86
heartbeat version is 2.0.1 installed as rpm package

ha.cf:
----------------------------------------------------------------------------------
use_logd yes
bcast eth0
node A B
crm on
auto_failback on
---------------------------------------------------------------------------------

authkeys
---------------------------------------------------------------------------------
auth 1
1 sha1 helloworld
---------------------------------------------------------------------------------

logd.cf
---------------------------------------------------------------------------------
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     daemon
---------------------------------------------------------------------------------

Thank you for your help.


--
Artem Pervin
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to