Hi,

On Thu, Nov 08, 2007 at 11:32:07AM +0900, HIDEO YAMAUCHI wrote:
> Hi,
> 
> I tested behavior of Heartbeat related to split-brain.
> I just checked recovery from split-brain.
> 
> I assume the following situation.
> 
> 1)The cluster group of two nodes of Actvie/Standby.
> 2)Hertbeat started with we having had a problem in LAN of the Heartbeat 
> communication.
> 3)DC starts in each node in a few minutes.
> 4)A resource starts in each node.
> 5)Heartbeat communication revives.
> 
> The recognition of the node was strange after this.
> I was going to stop each Heartbeat service here.
> Heartbeat stopped in one node, but Heartbeat did not stop in the other node.
> 
> Version 2.1.2 and the development version became the same results.
> 
> I think that it is a problem that Heartbeat of both nodes does not stop.

Not sure, but this looks suspicious:

dl380g5c/ha-log:crmd[31979]: 2007/11/08_10:40:16 info: do_shutdown_req: Sending 
shutdown request to DC: <null>

After that, crmd makes no effort to exit.

Another issue could be that for about two minutes, after the
split brain healed, that node couldn't set the DC:

crmd[31979]: 2007/11/08_10:38:42 info: update_dc: Set DC to <null> (<null>)
...

There's also an uncommon period of inactivity:

crmd[31979]: 2007/11/08_10:38:48 notice: populate_cib_nodes: Node: dl380g5c 
(uuid: a9abdd7e-0a39-40cd-bea5-74494ad97f89)
crmd[31979]: 2007/11/08_10:40:11 notice: crmd_client_status_callback: Status 
update: Client dl380g5d/crmd now has status [offline]

Thanks,

Dejan

> Regard,
> Hideo Yamauchi.
> 



> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to