On Wed, Oct 31, 2012 at 11:11 PM, James Guthrie <[email protected]> wrote: > Hi all, > > it appears as though this is the problem. The /etc/hosts file specifies > per-interface hostnames e.g. > > 192.168.200.170 r4-eth1 > > This explains the difference in the hostname that appears to be causing > a problem.
Do all the nodes have that mapping though? > > I have used a nodelist to specify the nodes of the cluster, their ids > and their names. This seems to have resolved the problem. I haven't been > able to do enough definitive testing. > > The "nodelist" feature is entirely undocumented, a look at the source > code confirmed that there was in fact a "name" field that would be > looked for in the config. When will the documentation be updated? The focus is slowly shifting to documentation now. This is one area in particular that needs documenting. > > I understand that the logs were displaying the warning signs of > something being wrong with the configuration, but it wasn't really > enough to be able to source the problem. Maybe this could be looked into? Absolutely. Ideally we'd be able to make it "just work" without the nodelist even. But I need to get my head around your configuration first :) > > Regards, > James > > > On 10/30/2012 01:03 PM, Michael Schwartzkopff wrote: >>> Hi Michael, >>> >>> I have managed to successfully configure corosync with udpu, it >>> unfortunately hasn't made a difference in the behaviour of the cluster. >>> >>> I have found that I don't even need to restart the host in order to get >>> this behaviour - all I need to do is stop and restart corosync and >>> pacemaker on *one* of the hosts. To be precise: I've been able to narrow >>> it down to only one of the two hosts (r3). If I reboot the host, or >>> restart the services on r4 everything works fine. If I try the same with >>> r3, I have problems. >>> >>> I feel as though the answer may lie in the logfiles, the >>> intercommunication between the individual components of the HA software >>> makes it a bit difficult to accurately read the logfiles as an outsider >>> to this software. I have attached the logs of both r3 and r4 after >>> reproducing this effect this afternoon, they are much shorter to read >>> than those previously: >>> >>> corosync-r3.log: http://pastebin.com/ZAhh5nax >>> corosync-r4.log: http://pastebin.com/SETtqnZM >>> >>> Are there any other steps I could take in debugging this behaviour? >>> >>> Regards, >>> James >> >> hi, >> >> I think you have a problem in the nameing of your clusters. In the first log >> it learns the name from DNS: >> >> Oct 29 13:41:14 [21723] r3 crmd: notice: corosync_node_name: >> Inferred node name 'r4-eth1' for nodeid 2 from DNS >> >> if that does not fit to the name of the node it might cause the problems. >> >> Greetings, >> >> >> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
