Hi Emmanuel,

corosync is bound to the correct interface on both hosts.

I looked for that line in the logs, but it didn't appear.

My previous e-mail addressed to Ulrich contains logfiles and a broad 
explanation of the process that those logfiles capture.

Regards,
James

On 10/25/2012 06:34 PM, Emmanuel Saint-Joanis wrote:
> Looks like a common timeout issue in network upcoming.
>
> See if corosync is bound to 127.0.0.1 instead of real interface with :
> corosync-cmapctl | grep member
>
> Also check if no line is appearing in /var/log/messages :
> WARN: cib_peer_callback: Discarding cib_apply_diff message (322) from
> server2: not in our membership
>
> Send logs to any web service as pastebin.com <http://pastebin.com>.
>
> 2012/10/25 James Guthrie <[email protected] <mailto:[email protected]>>
>
>     Hi all,
>
>     I've been battling with this problem for a few hours now, I've gone over
>     the obvious errors that it could have been with the guys in the linux-ha
>     IRC. I'd really like some help in trying to solve this problem.
>
>     I have a two node corosync/pacemaker cluster (corosync: 2.0.1 pacemaker:
>     1.1.8). I can get the cluster to work fine, but I can also very easily
>     get the cluster into a state from which it seems unable to recover. All
>     I have to do is reboot one of the cluster node's hosts. When doing so,
>     any resources that were running on it are transferred to the second
>     host. When the host comes back up though it appears as OFFLINE in the
>     crm_mon of both cluster nodes.
>
>     Regardless of what I do on the "offline" host, nothing gets better. If I
>     however stop and restart corosync/pacemaker on the other "online" host,
>     then everything seems to work again.
>
>     I tried waiting a while with one node offline, after a while the online
>     node went offline, stating that the other node was now offline. For a
>     few minutes the output of crm_mon was different on both hosts (both
>     thought the other was online, they were offline). Then finally it
>     settled in the exact opposite state as previously.
>
>     I've had a long look through the logs but I don't seem to be able to
>     pinpoint anything particular that tells me that there is a reason for
>     that host failing to be online.
>
>     I'd like to attach the logs, but thought that approx 1500 lines of
>     additional text in this e-mail might be a bit too much.
>
>     How should I best attach the logs and config files? Which parts of the
>     logs and config files would most likely reveal the problem in this case?
>
>     Regards,
>     James
>
>     _______________________________________________
>     Linux-HA mailing list
>     [email protected] <mailto:[email protected]>
>     http://lists.linux-ha.org/mailman/listinfo/linux-ha
>     See also: http://linux-ha.org/ReportingProblems
>
>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to