Hmmm. I'm still curious what two_node exactly does. In my testing, the clustering software comes up before the network is completely ready. (Why? That's another day.)
With just no-quorum-policy=ignore, regardless of the fence_join_delay value, the rebooted node fences the other node and starts up all split-brain. It takes about 30 seconds or so after the network is ready for the split brain to be detected. With no-quorum-policy=ignore and two_node="1" expected_votes="1", regardless of the fence_join_delay value, the rebooted node fences the other node, but as soon as the network is ready the other node joins the network and there is no split-brain. I'm happy that things are working, but I'm still curious for some idea about what two_node does. Regards. Mark K Vallevand "If there are no dogs in Heaven, then when I die I want to go where they went." -Will Rogers THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. -----Original Message----- From: linux-cluster-boun...@redhat.com [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Christine Caulfield Sent: Thursday, September 18, 2014 03:33 AM To: Andrew Beekhof Cc: linux clustering Subject: Re: [Linux-cluster] Cman (and corosync) starting before network interface is ready On 18/09/14 09:29, Andrew Beekhof wrote: > > On 18 Sep 2014, at 6:18 pm, Christine Caulfield <ccaul...@redhat.com> wrote: > >> On 18/09/14 02:35, Andrew Beekhof wrote: >>> >>> On 18 Sep 2014, at 12:34 am, Vallevand, Mark K <mark.vallev...@unisys.com> >>> wrote: >>> >>>> Thanks. >>>> >>>> 1. I didn't know about two-node mode. Thanks. We are testing with two >>>> nodes and "crm configure property no-quorum-policy=ignore". When one node >>>> goes down, the other node continues clustering. This is the desired >>>> behavior. What will <cman two_node="1" expected_votes="1"> </cman> in >>>> cluster.conf do? >>> >>> I was all set to be a smart-ass and say 'man cluster.conf', but the joke is >>> on me as my colleagues do not appear to have documented it anywhere. >>> Chrissie: Can you elaborate on the details here please? >>> >> >> it's documented in the cman(5) man page. The entries in cluster.conf only >> cover the general parts that are not specific to any subsystem. So corosync >> items are documented in the corosync man page and cman ones in the cman man >> page etc. > > Ah! Good to know. > > Two node clusters > Ordinarily, the loss of quorum after one out of two nodes > fails will prevent the remaining node from continuing (if both nodes have one > vote.) Special configuration options can be set to allow the one remaining > node to continue operating if the other > fails. To do this only two nodes, each with one vote, can be > defined in cluster.conf. The two_node and expected_votes values must then be > set to 1 in the cman section as follows. > > <cman two_node="1" expected_votes="1"> > </cman> > > One thing thats not clear to me is what happens when a single node comes up > and can only see itself. > Does it get quorum or is it like wait-for-all in corosync2? > There's no wait_for_all in cman. The first node up will attempt (after fence_join_delay) the other node in an attempt to stop a split brain. This is one of several reasons why we insist that the fencing is on a separate network to heartbeat on a two_node cluster. Chrissie >> >> Chrissie >> >> >>> (Short version, it should do what you want) >>> >>>> 2. Yes, fencing is part of our plan, but not at this time. In the >>>> configurations we are testing, fencing is a RFPITA. >>>> 3. We could move up. We like Ubuntu 12.04 LTS because it is Long Term >>>> Support. But, we've upgraded packages as necessary. So, if we move to >>>> the latest stable Pacemaker, Cman and Corosync (and others?), how could >>>> this help? >>> >>> Well you might get 3+ years of bug fixes and performance improvements :-) >>> >>>> >>>> Is there a way to get the clustering software to 'poll' faster? I mean, >>>> this NIC stalling at boot time only lasts about 2 seconds beyond the start >>>> of corosync. But, its 30 more seconds before the nodes see each other. I >>>> see lots of parameters in the totem directive that seem interesting. >>>> Would any of them be appropriate. >>> >>> Is there not a way to tell upstart not to start the cluster until the >>> network is up? >>> >>>> >>>> Andrew: Thanks for the prompt response. >>>> >>>> >>>> Regards. >>>> Mark K Vallevand >>>> >>>> "If there are no dogs in Heaven, then when I die I want to go where they >>>> went." >>>> -Will Rogers >>>> >>>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY >>>> MATERIAL and is thus for use only by the intended recipient. If you >>>> received this in error, please contact the sender and delete the e-mail >>>> and its attachments from all computers. >>>> >>>> >>>> -----Original Message----- >>>> From: linux-cluster-boun...@redhat.com >>>> [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Andrew Beekhof >>>> Sent: Tuesday, September 16, 2014 08:51 PM >>>> To: linux clustering >>>> Subject: Re: [Linux-cluster] Cman (and corosync) starting before network >>>> interface is ready >>>> >>>> >>>> On 17 Sep 2014, at 7:20 am, Vallevand, Mark K <mark.vallev...@unisys.com> >>>> wrote: >>>> >>>>> It looks like there is some odd delay in getting a network interface up >>>>> and ready. So, when cman starts corosync, it can't get to the cluster. >>>>> So, for a time, the node is a member of a cluster-of-one. The >>>>> cluster-of-one begins starting resources. >>>> >>>> 1. enable two-node mode in cluster.conf (man page should indicate >>>> where/how) then disable no-quorum-policy=ignore >>>> 2. configure fencing >>>> 3. find a newer version of pacemaker, we're up to .12 now >>>> >>>>> A few seconds later, when the interface finally is up and ready, it takes >>>>> about 30 more seconds for the cluster-of-one to finally rejoin the larger >>>>> cluster. The doubly-started resources are sorted out and all ends up OK. >>>>> >>>>> Now, this is not a good thing to have these particular resources running >>>>> twice. I'd really like the clustering software to behave better. But, >>>>> I'm not sure what 'behave better' would be. >>>>> >>>>> Is it possible to introduce a delay into cman or corosync startup? Is >>>>> that even wise? >>>>> Is there a parameter to get the clustering software to poll more often >>>>> when it can't rejoin the cluster? >>>>> >>>>> Any suggestions would be welcome. >>>>> >>>>> Running Ubuntu 12.04 LTS. Pacemaker 1.1.6. Cman 3.1.7. Corosync 1.4.2. >>>>> >>>>> Regards. >>>>> Mark K Vallevand >>>>> "If there are no dogs in Heaven, then when I die I want to go where they >>>>> went." >>>>> -Will Rogers >>>>> >>>>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY >>>>> MATERIAL and is thus for use only by the intended recipient. If you >>>>> received this in error, please contact the sender and delete the e-mail >>>>> and its attachments from all computers. >>>>> -- >>>>> Linux-cluster mailing list >>>>> Linux-cluster@redhat.com >>>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster@redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >> > -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster