Hi Artem - It looks like you selected wrong (or not exactly right) product for you project. Heartbeat (Linux-HA) cluster was designed to provide High Availability features for applications, not for distributed computations. In HA cluster in most cases a node that loses connection to the cluster hast to be killed (see STONITH feature) to prevent data corruption on shared devices in a split-brain situation.
Take a look at this product: http://oscar.openclustergroup.org may be it fits better for your needs. On 10/26/07, Artem Pervin <[EMAIL PROTECTED]> wrote: > Hello, all, > > I'm new to the heartbeat project and I'm experiencing some problems > in setting things up. I will be very grateful for any help. > > I want to use the heartbeat to detect a disconnection of a node in a > computational > cluster. As well, I need to detect when the node is back on the > network. On both events a custom script should be started either on the > head node > or both on the head node and on the computational node. > > I've managed to setup the heartbeat on two nodes (A and B) and I can > observe the nodes status with the help of > crm_mon. When I simulate the lost of connectivity on the node B (using > ifdown) after some time I can see the > changing in the status of this node: on the node A > crm_mon reports that B is "OFFLINE" and A - "online", and on the node B > crm_mon reports that the A is "OFFLINE" and > B - "online". In this case I simply cannot determine from crm_mon data > (or cib.xml) which node is actually down and > where should I shutdown my processes. > > Moreover when I turn on the network interface on the node B back the > status of the node doesn't change at all. Only > when I restart the heartbeat service either on the node B or on the node > A, the status of the node B returns to > "online". That's pretty odd behavior, I think. > > My questions are the following: > 1) Is the heartbeat an appropriate solution for my task or should I use > something else? > 2) If the heartbeat is fine, then what am I doing wrong? Why the status > of the node B doesn't return to "online" > state when turn on the network interface? > 3) Is there any API to check the status of the node? Parsing the cib.xml > is not very convenient. > > Here's my configuration: > CentOS, kernel 2.6.18, x86 > heartbeat version is 2.0.1 installed as rpm package > > ha.cf: > ---------------------------------------------------------------------------------- > use_logd yes > bcast eth0 > node A B > crm on > auto_failback on > --------------------------------------------------------------------------------- > > authkeys > --------------------------------------------------------------------------------- > auth 1 > 1 sha1 helloworld > --------------------------------------------------------------------------------- > > logd.cf > --------------------------------------------------------------------------------- > debugfile /var/log/ha-debug > logfile /var/log/ha-log > logfacility daemon > --------------------------------------------------------------------------------- > > Thank you for your help. > > > -- > Artem Pervin > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Serge Dubrouski. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
