Hi Fabio, removing UDPU does not change the behavior, the new node still doesn't join the cluster and still wants to fence node 01 It still feels like a split brain or so. How do you join a new node, using the /etc/init.d/cman start or using cman_tool / dlm_tool join ?
Bjoern On Sat, Feb 22, 2014 at 10:16 PM, Fabio M. Di Nitto <fdini...@redhat.com>wrote: > On 02/22/2014 08:05 PM, Bjoern Teipel wrote: > > Thanks Fabio for replying may request. > > > > I'm using stock CentOS 6.4 versions and no rm, just clvmd and dlm. > > > > Name : cman Relocations: (not relocatable) > > Version : 3.0.12.1 Vendor: CentOS > > Release : 49.el6_4.2 Build Date: Tue 03 Sep 2013 > > 02:18:10 AM PDT > > > > Name : lvm2-cluster Relocations: (not relocatable) > > Version : 2.02.98 Vendor: CentOS > > Release : 9.el6_4.3 Build Date: Tue 05 Nov 2013 > > 07:36:18 AM PST > > > > Name : corosync Relocations: (not relocatable) > > Version : 1.4.1 Vendor: CentOS > > Release : 15.el6_4.1 Build Date: Tue 14 May 2013 > > 02:09:27 PM PDT > > > > > > My question is based off this problem I have till January: > > > > > > When ever I add a new node (I put into the cluster.conf and reloaded > > with cman_tool version -r -S) I end up with situations like the new > > node wants to gain the quorum and starts to fence the existing pool > > master and appears to generate some sort of split cluster. Does it work > > at all, corosync and dlm do not know about the recently added node ? > > I can see you are using UDPU and that could be the culprit. Can you drop > UDPU and work with multicast? > > Jan/Chrissie: do you remember if we support adding nodes at runtime with > UDPU? > > The standalone node should not have quorum at all and should not be able > to fence anybody to start with. > > > > > New Node > > ========== > > > > Node Sts Inc Joined Name > > 1 X 0 hv-1 > > 2 X 0 hv-2 > > 3 X 0 hv-3 > > 4 X 0 hv-4 > > 5 X 0 hv-5 > > 6 M 80 2014-01-07 21:37:42 hv-6<--- host added > > > > > > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] The network interface > > [10.14.18.77] is now up. > > Jan 7 21:37:42 hv-1 corosync[12564]: [QUORUM] Using quorum provider > > quorum_cman > > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded: > > corosync cluster quorum service v0.1 > > Jan 7 21:37:42 hv-1 corosync[12564]: [CMAN ] CMAN 3.0.12.1 (built > > Sep 3 2013 09:17:34) started > > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded: > > corosync CMAN membership service 2.90 > > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded: > > openais checkpoint service B.01.01 > > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded: > > corosync extended virtual synchrony service > > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded: > > corosync configuration service > > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded: > > corosync cluster closed process group service v1.01 > > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded: > > corosync cluster config database access v1.01 > > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded: > > corosync profile loading service > > Jan 7 21:37:42 hv-1 corosync[12564]: [QUORUM] Using quorum provider > > quorum_cman > > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded: > > corosync cluster quorum service v0.1 > > Jan 7 21:37:42 hv-1 corosync[12564]: [MAIN ] Compatibility mode set > > to whitetank. Using V1 and V2 of the synchronization engine. > > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member > > {10.14.18.65} > > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member > > {10.14.18.67} > > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member > > {10.14.18.68} > > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member > > {10.14.18.70} > > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member > > {10.14.18.66} > > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member > > {10.14.18.77} > > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] A processor joined or > > left the membership and a new membership was formed. > > Jan 7 21:37:42 hv-1 corosync[12564]: [CMAN ] quorum regained, > > resuming activity > > Jan 7 21:37:42 hv-1 corosync[12564]: [QUORUM] This node is within the > > primary component and will provide service. > > Jan 7 21:37:42 hv-1 corosync[12564]: [QUORUM] Members[1]: 6 > > Jan 7 21:37:42 hv-1 corosync[12564]: [QUORUM] Members[1]: 6 > > Jan 7 21:37:42 hv-1 corosync[12564]: [CPG ] chosen downlist: sender > > r(0) ip(10.14.18.77) ; members(old:0 left:0) > > Jan 7 21:37:42 hv-1 corosync[12564]: [MAIN ] Completed service > > synchronization, ready to provide service. > > Jan 7 21:37:46 hv-1 fenced[12620]: fenced 3.0.12.1 started > > Jan 7 21:37:46 hv-1 dlm_controld[12643]: dlm_controld 3.0.12.1 started > > Jan 7 21:37:47 hv-1 gfs_controld[12695]: gfs_controld 3.0.12.1 started > > Jan 7 21:37:54 hv-1 fenced[12620]: fencing node hv-b1clcy1 > > > > sudo -i corosync-objctl |grep member > > > > totem.interface.member.memberaddr=hv-1 > > totem.interface.member.memberaddr=hv-2 > > totem.interface.member.memberaddr=hv-3 > > totem.interface.member.memberaddr=hv-4 > > totem.interface.member.memberaddr=hv-5 > > totem.interface.member.memberaddr=hv-6 > > runtime.totem.pg.mrp.srp.members.6.ip=r(0) ip(10.14.18.77) > > runtime.totem.pg.mrp.srp.members.6.join_count=1 > > runtime.totem.pg.mrp.srp.members.6.status=joined > > > > > > Existing Node > > ============= > > > > member 6 has not been added to the quorum list : > > > > Jan 7 21:36:28 hv-1 corosync[7769]: [QUORUM] Members[4]: 1 2 3 5 > > Jan 7 21:37:54 hv-1 corosync[7769]: [TOTEM ] A processor joined or > > left the membership and a new membership was formed. > > Jan 7 21:37:54 hv-1 corosync[7769]: [CPG ] chosen downlist: sender > > r(0) ip(10.14.18.65) ; members(old:4 left:0) > > > > > > Node Sts Inc Joined Name > > 1 M 4468 2013-12-10 14:33:27 hv-1 > > 2 M 4468 2013-12-10 14:33:27 hv-2 > > 3 M 5036 2014-01-07 17:51:26 hv-3 > > 4 X 4468 hv-4(dead at the moment) > > 5 M 4468 2013-12-10 14:33:27 hv-5 > > 6 X 0 hv-6<--- added > > > > > > Jan 7 21:36:28 hv-1 corosync[7769]: [QUORUM] Members[4]: 1 2 3 5 > > Jan 7 21:37:54 hv-1 corosync[7769]: [TOTEM ] A processor joined or > > left the membership and a new membership was formed. > > Jan 7 21:37:54 hv-1 corosync[7769]: [CPG ] chosen downlist: sender > > r(0) ip(10.14.18.65) ; members(old:4 left:0) > > Jan 7 21:37:54 hv-1 corosync[7769]: [MAIN ] Completed service > > synchronization, ready to provide service. > > > > > > totem.interface.member.memberaddr=hv-1 > > totem.interface.member.memberaddr=hv-2 > > totem.interface.member.memberaddr=hv-3 > > totem.interface.member.memberaddr=hv-4 > > totem.interface.member.memberaddr=hv-5. > > runtime.totem.pg.mrp.srp.members.1.ip=r(0) ip(10.14.18.65) > > runtime.totem.pg.mrp.srp.members.1.join_count=1 > > runtime.totem.pg.mrp.srp.members.1.status=joined > > runtime.totem.pg.mrp.srp.members.2.ip=r(0) ip(10.14.18.66) > > runtime.totem.pg.mrp.srp.members.2.join_count=1 > > runtime.totem.pg.mrp.srp.members.2.status=joined > > runtime.totem.pg.mrp.srp.members.4.ip=r(0) ip(10.14.18.68) > > runtime.totem.pg.mrp.srp.members.4.join_count=1 > > runtime.totem.pg.mrp.srp.members.4.status=left > > runtime.totem.pg.mrp.srp.members.5.ip=r(0) ip(10.14.18.70) > > runtime.totem.pg.mrp.srp.members.5.join_count=1 > > runtime.totem.pg.mrp.srp.members.5.status=joined > > runtime.totem.pg.mrp.srp.members.3.ip=r(0) ip(10.14.18.67) > > runtime.totem.pg.mrp.srp.members.3.join_count=3 > > runtime.totem.pg.mrp.srp.members.3.status=joined > > > > > > cluster.conf: > > > > <?xml version="1.0"?> > > <cluster config_version="32" name="hv-1618-110-1"> > > <fence_daemon clean_start="0"/> > > <cman transport="udpu" expected_votes="1"/> > > <logging debug="off"/> > > <clusternodes> > > <clusternode name="hv-1" votes="1" nodeid="1"><fence><method > > name="single"><device name="human"/></method></fence></clusternode> > > <clusternode name="hv-2" votes="1" nodeid="3"><fence><method > > name="single"><device name="human"/></method></fence></clusternode> > > <clusternode name="hv-3" votes="1" nodeid="4"><fence><method > > name="single"><device name="human"/></method></fence></clusternode> > > <clusternode name="hv-4" votes="1" nodeid="5"><fence><method > > name="single"><device name="human"/></method></fence></clusternode> > > <clusternode name="hv-5" votes="1" nodeid="2"><fence><method > > name="single"><device name="human"/></method></fence></clusternode> > > <clusternode name="hv-6" votes="1" nodeid="6"><fence><method > > name="single"><device name="human"/></method></fence></clusternode> > > </clusternodes> > > <fencedevices> > > <fencedevice name="human" agent="manual"/></fencedevices> > > <rm/> > > </cluster> > > > > (manual fencing just for testing) > > > > > > corosync.conf: > > > > compatibility: whitetank > > totem { > > version: 2 > > secauth: off > > threads: 0 > > # fail_recv_const: 5000 > > interface { > > ringnumber: 0 > > bindnetaddr: 10.14.18.0 > > mcastaddr: 239.0.0.4 > > mcastport: 5405 > > } > > } > > logging { > > fileline: off > > to_stderr: no > > to_logfile: yes > > to_syslog: yes > > # the pathname of the log file > > logfile: /var/log/cluster/corosync.log > > debug: off > > timestamp: on > > logger_subsys { > > subsys: AMF > > debug: off > > } > > } > > > > amf { > > mode: disabled > > } > > > > when using cman, corosync.conf is not used/read. > > Fabio > > > > > > > On Sat, Feb 22, 2014 at 5:54 AM, Fabio M. Di Nitto <fdini...@redhat.com > > <mailto:fdini...@redhat.com>> wrote: > > > > On 02/22/2014 10:33 AM, emmanuel segura wrote: > > > I know if you need to modify anything outside <rm>... </rm>{used by > > > rgmanager} tag in the cluster.conf file, you need to restart the > whole > > > cluster stack, with cman+rgmanager i have never seen how to add a > node > > > and remove a node from cluster without restart cman. > > > > It depends on the version. RHEL5 that's correct, RHEL6 it works also > for > > outside of <rm> but there are some limitations as some parameters > just > > can't be changed runtime. > > > > Fabio > > > > > > > > > > > > > > > > > 2014-02-22 6:21 GMT+01:00 Bjoern Teipel > > > <bjoern.tei...@internetbrands.com > > <mailto:bjoern.tei...@internetbrands.com> > > > <mailto:bjoern.tei...@internetbrands.com > > <mailto:bjoern.tei...@internetbrands.com>>>: > > > > > > Hi all, > > > > > > who's using CLVM with CMAN in a cluster with more than 2 nodes > in > > > production ? > > > Did you guys got it to manage to live add a new node to the > > cluster > > > while everything is running ? > > > I'm only able to add nodes while the cluster stack is shutdown. > > > That's certainly not a good idea when you have to run CLVM on > > > hypervisors and you need to shutdown all VMs to add a new box. > > > Would be also good if you paste some of your configs using > > IPMI fencing > > > > > > Thanks in advance, > > > Bjoern > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster@redhat.com <mailto:Linux-cluster@redhat.com> > > <mailto:Linux-cluster@redhat.com <mailto:Linux-cluster@redhat.com>> > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > > > > > > -- > > > esta es mi vida e me la vivo hasta que dios quiera > > > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster@redhat.com <mailto:Linux-cluster@redhat.com> > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster >
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster