Hi Linux HA list, I'm having this same problem that was reported previously with 2 servers paired up that are not communicating with each other. Each shows the other as offline in crm_mon. I'm running Linux HA 2.1.4 in CRM mode. I see these messages in the log file:
cib[20479]: 2010/01/28_08:58:25 info: write_cib_contents: Wrote version 0.601.1 of the CIB to disk (digest: 22cd418a378a5ee22c1cc6347fa69817) cib[18546]: 2010/01/28_08:58:25 WARN: cib_peer_callback: Discarding cib_apply_diff message (732b9) from so1b: not in our membership cib[18546]: 2010/01/28_08:58:25 WARN: cib_peer_callback: Discarding cib_apply_diff message (732bb) from so1b: not in our membership cib[20479]: 2010/01/28_08:58:25 info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig) cib[20479]: 2010/01/28_08:58:25 info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest: /var/lib/heartbeat/crm/cib.xml.sig.last) cib[18546]: 2010/01/28_08:58:26 WARN: cib_peer_callback: Discarding cib_apply_diff message (732c7) from so1b: not in our membership Each server appears to be rejecting the other from membership. They were working fine and arbitrating an IPaddr2 resource before a split brain occurred. After the split brain recovered, these errors started appearing. I've verified with tcpdump that heartbeat connectivity is intact. Any ideas? Thanks in advance for any help! Regards, Eric Blau -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Andrew Beekhof Sent: Friday, 24 July, 2009 08:39 To: General Linux-HA mailing list Subject: Re: [Linux-HA] Node ha2 is not sync with node ha1 What version are you using? There was a bug like this but it was fixed a long time ago On Wed, Jul 22, 2009 at 10:02 AM, Ahmed Munir<[email protected]> wrote: > Hi all, > Hoping you all fine. I've got 2 machines and I've installed Linux HA and > OpenSIPs on them and configured them as an active-active scenario. Machine 1 > named ha1, is assigned with virtual IP 192.168.0.184 and machine 2 named > ha2, is assigned with virtual IP 192.168.0.185. > > The integration between HA and OpenSIPs is working fine. Like if I stop the > service of HA, machine ha1 comes down, its resources are taken by machine > ha2 and when ha1 comes online, ha1 take its resources back from machine ha2 > and vice versa. > > If I turn off ha1 machine its resources are taken by machine ha2 and > when ha1 comes online, ha1 take its resources back from machine ha2 which is > working fine. But when I turn off ha2 machine its resources are taken by > machine ha1 and when ha2 comes online, and I check the status of ha2 using > crm_mon command, > it shows me weird status as I'm listing down below; > > On ha1 machine; > > Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): online > Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): offline > > IPaddr_1 (heartbeat::ocf:IPaddr): Started ha1 > IPaddr_2 (heartbeat::ocf:IPaddr): Started ha1 > OpenSips_1 (heartbeat::ocf:OpenSips): Started ha1 > OpenSips_2 (heartbeat::ocf:OpenSips): Started ha1 > > On ha2 machine; > > Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): offline > Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): online > > IPaddr_1 (heartbeat::ocf:IPaddr): Started ha2 > IPaddr_2 (heartbeat::ocf:IPaddr): Started ha2 > OpenSips_1 (heartbeat::ocf:OpenSips): Started ha2 > OpenSips_2 (heartbeat::ocf:OpenSips): Started ha2 > > Or sometimes on ha2 machine; > > Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): online > Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): offline > > IPaddr_1 (heartbeat::ocf:IPaddr): Started ha1 > IPaddr_2 (heartbeat::ocf:IPaddr): Started ha1 > OpenSips_1 (heartbeat::ocf:OpenSips): Started ha1 > OpenSips_2 (heartbeat::ocf:OpenSips): Started ha1 > > After that I've checked logs and I'm getting these errors as listed below; > > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > cib_apply_diff message (3a9) from ha2: not in our membership > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > cib_apply_diff message (3aa) from ha2: not in our membership > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > cib_apply_diff message (3ab) from ha2: not in our membership > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > cib_apply_diff message (3ac) from ha2: not in our membership > Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > cib_apply_diff message (3ad) from ha2: not in our membership > Jul 22 14:12:07 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding > cib_apply_diff message (3b0) from ha2: not in our membership > Jul 22 14:12:07 ha1 ccm: [9977]: ERROR: llm_set_uptime: Negative uptime > -1778384896 for node 0 [ha1] > Jul 22 14:12:07 ha1 ccm: [9977]: ERROR: llm_set_uptime: Negative uptime > -1879048192 for node 1 [ha2] > > Even I've configured same settings on both machines but I don't know why > I'm getting these errors. > > Further added I'm attaching cib.xml, OpenSips (which I created resource file > for OpenSIPs), ha.cf and log files. Kindly do have a look and update > me ASAP. > > > -- > Regards, > > Ahmed Munir > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
