Hi Linux HA list,

I'm having this same problem that was reported previously with 2 servers
paired up that are not communicating with each other.  Each shows the other
as offline in crm_mon.  I'm running Linux HA 2.1.4 in CRM mode.  I see these
messages in the log file:

cib[20479]: 2010/01/28_08:58:25 info: write_cib_contents: Wrote version
0.601.1 of the CIB to disk (digest: 22cd418a378a5ee22c1cc6347fa69817)
cib[18546]: 2010/01/28_08:58:25 WARN: cib_peer_callback: Discarding
cib_apply_diff message (732b9) from so1b: not in our membership
cib[18546]: 2010/01/28_08:58:25 WARN: cib_peer_callback: Discarding
cib_apply_diff message (732bb) from so1b: not in our membership
cib[20479]: 2010/01/28_08:58:25 info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
/var/lib/heartbeat/crm/cib.xml.sig)
cib[20479]: 2010/01/28_08:58:25 info: retrieveCib: Reading cluster
configuration from: /var/lib/heartbeat/crm/cib.xml.last (digest:
/var/lib/heartbeat/crm/cib.xml.sig.last)
cib[18546]: 2010/01/28_08:58:26 WARN: cib_peer_callback: Discarding
cib_apply_diff message (732c7) from so1b: not in our membership

Each server appears to be rejecting the other from membership.  They were
working fine and arbitrating an IPaddr2 resource before a split brain
occurred.  After the split brain recovered, these errors started appearing.
I've verified with tcpdump that heartbeat connectivity is intact.

Any ideas?

Thanks in advance for any help!

Regards,
Eric Blau


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Andrew Beekhof
Sent: Friday, 24 July, 2009 08:39
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Node ha2 is not sync with node ha1

What version are you using?
There was a bug like this but it was fixed a long time ago

On Wed, Jul 22, 2009 at 10:02 AM, Ahmed Munir<[email protected]>
wrote:
> Hi all,
> Hoping you all fine. I've got 2 machines and I've installed Linux HA and
> OpenSIPs on them and configured them as an active-active scenario. Machine
1
> named ha1, is assigned with virtual IP 192.168.0.184 and machine 2 named
> ha2, is assigned with virtual IP 192.168.0.185.
>
> The integration between HA and OpenSIPs is working fine. Like if I stop
the
> service of  HA, machine ha1 comes down, its resources are taken by machine
> ha2 and when ha1 comes online, ha1 take its resources back from machine
ha2
> and vice versa.
>
> If I turn off ha1 machine its resources are taken by machine ha2 and
> when ha1 comes online, ha1 take its resources back from machine ha2 which
is
> working fine. But when I turn off ha2 machine its resources are taken by
> machine ha1 and when ha2 comes online, and I check the status of ha2 using
> crm_mon command,
> it shows me weird status as I'm listing down below;
>
> On ha1 machine;
>
> Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): online
> Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): offline
>
> IPaddr_1           (heartbeat::ocf:IPaddr):        Started ha1
> IPaddr_2           (heartbeat::ocf:IPaddr):        Started ha1
> OpenSips_1      (heartbeat::ocf:OpenSips):      Started ha1
> OpenSips_2      (heartbeat::ocf:OpenSips):      Started ha1
>
> On ha2 machine;
>
> Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): offline
> Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): online
>
> IPaddr_1           (heartbeat::ocf:IPaddr):        Started ha2
> IPaddr_2           (heartbeat::ocf:IPaddr):        Started ha2
> OpenSips_1      (heartbeat::ocf:OpenSips):      Started ha2
> OpenSips_2      (heartbeat::ocf:OpenSips):      Started ha2
>
> Or sometimes on ha2 machine;
>
> Node: ha1 (e651c120-b9a1-489a-baf7-caf0028ad540): online
> Node: ha2 (70503c2e-bb4a-48f8-aab3-53696656a4d0): offline
>
> IPaddr_1           (heartbeat::ocf:IPaddr):        Started ha1
> IPaddr_2           (heartbeat::ocf:IPaddr):        Started ha1
> OpenSips_1      (heartbeat::ocf:OpenSips):      Started ha1
> OpenSips_2      (heartbeat::ocf:OpenSips):      Started ha1
>
> After that I've checked logs and I'm getting these errors as listed below;
>
> Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> cib_apply_diff message (3a9) from ha2: not in our membership
> Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> cib_apply_diff message (3aa) from ha2: not in our membership
> Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> cib_apply_diff message (3ab) from ha2: not in our membership
> Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> cib_apply_diff message (3ac) from ha2: not in our membership
> Jul 22 14:12:06 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> cib_apply_diff message (3ad) from ha2: not in our membership
> Jul 22 14:12:07 ha1 cib: [9978]: WARN: cib_peer_callback: Discarding
> cib_apply_diff message (3b0) from ha2: not in our membership
> Jul 22 14:12:07 ha1 ccm: [9977]: ERROR: llm_set_uptime: Negative uptime
> -1778384896 for node 0 [ha1]
> Jul 22 14:12:07 ha1 ccm: [9977]: ERROR: llm_set_uptime: Negative uptime
> -1879048192 for node 1 [ha2]
>
> Even I've configured same settings on both machines but I don't know  why
> I'm getting these errors.
>
> Further added I'm attaching cib.xml, OpenSips (which I created resource
file
> for OpenSIPs), ha.cf and log files. Kindly do have a look and update
> me ASAP.
>
>
> --
> Regards,
>
> Ahmed Munir
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to