Hi Michael, Yes, that is exactly the problem I'm having.
As far as I can tell everything is working fine (or appears to be) on the communication layer. I have not once had a problem with corosync. As I mentioned before, I can get the cluster to work perfectly fine for periods of time, but I can also with ease get it into this state of one node online and the other offline. What confuses me is that the cluster doesn't seem to try to recover from this state in any way. Even more vexing is the fact that there doesn't seem to be anything that points to why this could be taking place. Regards, James P.S. I'm working with your "Clusterbau" book! On 10/26/2012 03:34 PM, Michael Schwartzkopff wrote: >> Hi Emmanuel, > (...) > > Hi, > > as far as I understood you problem ist that one node stays marked "offline" > after a reboot in subsequent manual start of corosync and pacemaker. > > Did you check if the nodes see each other if you only start corosync without > pacemaker? Pleaes check the corosync tools. You also could tcpdump the traffic > on the cluster port (default 5405) if the nodes see each other. > > Greetings, > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
