Looks like a common timeout issue in network upcoming. See if corosync is bound to 127.0.0.1 instead of real interface with : corosync-cmapctl | grep member
Also check if no line is appearing in /var/log/messages : WARN: cib_peer_callback: Discarding cib_apply_diff message (322) from server2: not in our membership Send logs to any web service as pastebin.com. 2012/10/25 James Guthrie <[email protected]> > Hi all, > > I've been battling with this problem for a few hours now, I've gone over > the obvious errors that it could have been with the guys in the linux-ha > IRC. I'd really like some help in trying to solve this problem. > > I have a two node corosync/pacemaker cluster (corosync: 2.0.1 pacemaker: > 1.1.8). I can get the cluster to work fine, but I can also very easily > get the cluster into a state from which it seems unable to recover. All > I have to do is reboot one of the cluster node's hosts. When doing so, > any resources that were running on it are transferred to the second > host. When the host comes back up though it appears as OFFLINE in the > crm_mon of both cluster nodes. > > Regardless of what I do on the "offline" host, nothing gets better. If I > however stop and restart corosync/pacemaker on the other "online" host, > then everything seems to work again. > > I tried waiting a while with one node offline, after a while the online > node went offline, stating that the other node was now offline. For a > few minutes the output of crm_mon was different on both hosts (both > thought the other was online, they were offline). Then finally it > settled in the exact opposite state as previously. > > I've had a long look through the logs but I don't seem to be able to > pinpoint anything particular that tells me that there is a reason for > that host failing to be online. > > I'd like to attach the logs, but thought that approx 1500 lines of > additional text in this e-mail might be a bit too much. > > How should I best attach the logs and config files? Which parts of the > logs and config files would most likely reveal the problem in this case? > > Regards, > James > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
