On Nov 8, 2007, at 8:19 AM, Dejan Muhamedagic wrote:

Hi,

On Thu, Nov 08, 2007 at 11:32:07AM +0900, HIDEO YAMAUCHI wrote:
Hi,

I tested behavior of Heartbeat related to split-brain.
I just checked recovery from split-brain.

I assume the following situation.

1)The cluster group of two nodes of Actvie/Standby.
2)Hertbeat started with we having had a problem in LAN of the Heartbeat communication.
3)DC starts in each node in a few minutes.
4)A resource starts in each node.
5)Heartbeat communication revives.

The recognition of the node was strange after this.
I was going to stop each Heartbeat service here.
Heartbeat stopped in one node, but Heartbeat did not stop in the other node.

Version 2.1.2 and the development version became the same results.

I think that it is a problem that Heartbeat of both nodes does not stop.

Not sure, but this looks suspicious:

dl380g5c/ha-log:crmd[31979]: 2007/11/08_10:40:16 info: do_shutdown_req: Sending shutdown request to DC: <null>

After that, crmd makes no effort to exit.

Another issue could be that for about two minutes, after the
split brain healed, that node couldn't set the DC:

crmd[31979]: 2007/11/08_10:38:42 info: update_dc: Set DC to <null> (<null>)
...

There's also an uncommon period of inactivity:

crmd[31979]: 2007/11/08_10:38:48 notice: populate_cib_nodes: Node: dl380g5c (uuid: a9abdd7e-0a39-40cd-bea5-74494ad97f89) crmd[31979]: 2007/11/08_10:40:11 notice: crmd_client_status_callback: Status update: Client dl380g5d/crmd now has status [offline]

The root cause seems to be that heartbeat is not providing client status messages (to say that the crmd processes are active) once the split-brain heals.

crmd[1350]: 2007/11/08_10:38:43 info: join_make_offer: Peer process on dl380g5c is not active (yet?) crmd[1350]: 2007/11/08_10:40:11 WARN: do_state_transition: Only 1 of 2 cluster nodes are eligible to run resources - continue 0

Because of this, the crm doesn't consider dl380g5c online and the PE can't shut it down.


I think you need to file a bug for alan about this.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to