Hi James, the cluster stack starts automatically on boot of the offline host? If so, the node probably won't become online immediately. The syslog (unless redirected) of the offline node will provide initial clues what is going on. Also see the syslog of the online node. Maybe watching the syslog (tail -f) of the online node while the other node boots is a good idea.
Regards, Ulrich >>> James Guthrie <[email protected]> schrieb am 25.10.2012 um 18:13 in Nachricht <[email protected]>: > Hi all, > > I've been battling with this problem for a few hours now, I've gone over > the obvious errors that it could have been with the guys in the linux-ha > IRC. I'd really like some help in trying to solve this problem. > > I have a two node corosync/pacemaker cluster (corosync: 2.0.1 pacemaker: > 1.1.8). I can get the cluster to work fine, but I can also very easily > get the cluster into a state from which it seems unable to recover. All > I have to do is reboot one of the cluster node's hosts. When doing so, > any resources that were running on it are transferred to the second > host. When the host comes back up though it appears as OFFLINE in the > crm_mon of both cluster nodes. > > Regardless of what I do on the "offline" host, nothing gets better. If I > however stop and restart corosync/pacemaker on the other "online" host, > then everything seems to work again. > > I tried waiting a while with one node offline, after a while the online > node went offline, stating that the other node was now offline. For a > few minutes the output of crm_mon was different on both hosts (both > thought the other was online, they were offline). Then finally it > settled in the exact opposite state as previously. > > I've had a long look through the logs but I don't seem to be able to > pinpoint anything particular that tells me that there is a reason for > that host failing to be online. > > I'd like to attach the logs, but thought that approx 1500 lines of > additional text in this e-mail might be a bit too much. > > How should I best attach the logs and config files? Which parts of the > logs and config files would most likely reveal the problem in this case? > > Regards, > James > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
