Re: [Linux-cluster] daemon cpg_join error retrying

Digimer Wed, 29 Oct 2014 15:36:18 -0700

On 29/10/14 06:16 PM, Andrew Beekhof wrote:

On 30 Oct 2014, at 9:06 am, Lax Kota (lkota) <[email protected]> wrote:

I wonder if there is a mismatch between the cluster name in cluster.conf and 
the cluster name the GFS filesystem was created with.

How to check  cluster name of GFS file system? I had similar configuration 
running fine in multiple other setups with no such issue.


I don't really recall. Hopefully someone more familiar with GFS2 can chime in.


# gfs2_tool sb /dev/c01n01_vg0/shared table
current lock table name = "an-cluster-01:shared"

Replace with your device, of course. :)


Also one more issue I am seeing in one other setup a repeated flood of 'A 
processor joined or left the membership and a new membership was formed' 
messages for every 4secs. I am running with default TOTEM settings with token 
time out as 10 secs. Even after I increase the token, consensus values to be 
higher. It goes on flooding the same message after newer consensus defined time 
(eg: if I increase it to be 10secs, then I see new membership formed messages 
for every 10secs)

Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [TOTEM ] A processor joined or 
left the membership and a new membership was formed.
Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [CPG   ] chosen downlist: 
sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
Oct 29 14:58:10 VSM76-VSOM64 corosync[28388]:   [MAIN  ] Completed service 
synchronization, ready to provide service.

Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [TOTEM ] A processor joined or 
left the membership and a new membership was formed.
Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [CPG   ] chosen downlist: 
sender r(0) ip(172.28.0.64) ; members(old:2 left:0)
Oct 29 14:58:14 VSM76-VSOM64 corosync[28388]:   [MAIN  ] Completed service 
synchronization, ready to provide service.


It does not sound like your network is particularly healthy.
Are you using multicast or udpu? If multicast, it might be worth trying udpu


Agreed. Persistent multicast required?

Thanks
Lax


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Andrew Beekhof
Sent: Wednesday, October 29, 2014 2:42 PM
To: linux clustering
Subject: Re: [Linux-cluster] daemon cpg_join error retrying

On 30 Oct 2014, at 8:38 am, Lax Kota (lkota) <[email protected]> wrote:

Hi All,

In one of my setup, I keep getting getting 'gfs_controld[10744]: daemon 
cpg_join error  retrying'. I have a 2 Node setup with pacemaker and corosync.


I wonder if there is a mismatch between the cluster name in cluster.conf and 
the cluster name the GFS filesystem was created with.


Even after I force kill the pacemaker processes and reboot the server and bring 
the pacemaker back up, it keeps giving cpg_join error. Is  there any way to fix 
this issue?


Thanks
Lax

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Digimer
Papers and Projects: https://alteeve.ca/w/

What if the cure for cancer is trapped in the mind of a person withoutaccess to education?


--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] daemon cpg_join error retrying

Reply via email to