Re: [Linux-cluster] Cman (and corosync) starting before network interface is ready

Vallevand, Mark K Thu, 18 Sep 2014 06:19:32 -0700

Hmmm.  I'm still curious what two_node exactly does.  

In my testing, the clustering software comes up before the network is 
completely ready.  (Why?  That's another day.)


With just no-quorum-policy=ignore, regardless of the fence_join_delay value, 
the rebooted node fences the other node and starts up all split-brain.  It 
takes about 30 seconds or so after the network is ready for the split brain to 
be detected.

With no-quorum-policy=ignore and two_node="1" expected_votes="1", regardless of 
the fence_join_delay value, the rebooted node fences the other node, but as 
soon as the network is ready the other node joins the network and there is no 
split-brain.

I'm happy that things are working, but I'm still curious for some idea about 
what two_node does.


Regards.
Mark K Vallevand

"If there are no dogs in Heaven, then when I die I want to go where they went." 
-Will Rogers

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.


-----Original Message-----
From: linux-cluster-boun...@redhat.com 
[mailto:linux-cluster-boun...@redhat.com] On Behalf Of Christine Caulfield
Sent: Thursday, September 18, 2014 03:33 AM
To: Andrew Beekhof
Cc: linux clustering
Subject: Re: [Linux-cluster] Cman (and corosync) starting before network 
interface is ready

On 18/09/14 09:29, Andrew Beekhof wrote:
>
> On 18 Sep 2014, at 6:18 pm, Christine Caulfield <ccaul...@redhat.com> wrote:
>
>> On 18/09/14 02:35, Andrew Beekhof wrote:
>>>
>>> On 18 Sep 2014, at 12:34 am, Vallevand, Mark K <mark.vallev...@unisys.com> 
>>> wrote:
>>>
>>>> Thanks.
>>>>
>>>> 1. I didn't know about two-node mode.  Thanks.  We are testing with two 
>>>> nodes and "crm configure property no-quorum-policy=ignore".  When one node 
>>>> goes down, the other node continues clustering.  This is the desired 
>>>> behavior.  What will <cman two_node="1" expected_votes="1"> </cman> in 
>>>> cluster.conf do?
>>>
>>> I was all set to be a smart-ass and say 'man cluster.conf', but the joke is 
>>> on me as my colleagues do not appear to have documented it anywhere.
>>> Chrissie: Can you elaborate on the details here please?
>>>
>>
>> it's documented in the cman(5) man page. The entries in cluster.conf only 
>> cover the general parts that are not specific to any subsystem. So corosync 
>> items are documented in the corosync man page and cman ones in the cman man 
>> page etc.
>
> Ah! Good to know.
>
>         Two node clusters
>                Ordinarily,  the loss of quorum after one out of two nodes 
> fails will prevent the remaining node from continuing (if both nodes have one 
> vote.)  Special configuration options can be set to allow the one remaining 
> node to continue operating if the other
>                fails.  To do this only two nodes, each with one vote, can be 
> defined in cluster.conf.  The two_node and expected_votes values must then be 
> set to 1 in the cman section as follows.
>
>                  <cman two_node="1" expected_votes="1">
>                  </cman>
>
> One thing thats not clear to me is what happens when a single node comes up 
> and can only see itself.
> Does it get quorum or is it like wait-for-all in corosync2?
>


There's no wait_for_all in cman. The first node up will attempt (after 
fence_join_delay) the other node in an attempt to stop a split brain.

This is one of several reasons why we insist that the fencing is on a 
separate network to heartbeat on a two_node cluster.


Chrissie

>>
>> Chrissie
>>
>>
>>> (Short version, it should do what you want)
>>>
>>>> 2. Yes, fencing is part of our plan, but not at this time.  In the 
>>>> configurations we are testing, fencing is a RFPITA.
>>>> 3. We could move up.  We like Ubuntu 12.04 LTS because it is Long Term 
>>>> Support.  But, we've upgraded packages as necessary.  So, if we move to 
>>>> the latest stable Pacemaker, Cman and Corosync (and others?), how could 
>>>> this help?
>>>
>>> Well you might get 3+ years of bug fixes and performance improvements :-)
>>>
>>>>
>>>> Is there a way to get the clustering software to 'poll' faster?  I mean, 
>>>> this NIC stalling at boot time only lasts about 2 seconds beyond the start 
>>>> of corosync.  But, its 30 more seconds before the nodes see each other.  I 
>>>> see lots of parameters in the totem directive that seem interesting.  
>>>> Would any of them be appropriate.
>>>
>>> Is there not a way to tell upstart not to start the cluster until the 
>>> network is up?
>>>
>>>>
>>>> Andrew: Thanks for the prompt response.
>>>>
>>>>
>>>> Regards.
>>>> Mark K Vallevand
>>>>
>>>> "If there are no dogs in Heaven, then when I die I want to go where they 
>>>> went."
>>>> -Will Rogers
>>>>
>>>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
>>>> MATERIAL and is thus for use only by the intended recipient. If you 
>>>> received this in error, please contact the sender and delete the e-mail 
>>>> and its attachments from all computers.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: linux-cluster-boun...@redhat.com 
>>>> [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Andrew Beekhof
>>>> Sent: Tuesday, September 16, 2014 08:51 PM
>>>> To: linux clustering
>>>> Subject: Re: [Linux-cluster] Cman (and corosync) starting before network 
>>>> interface is ready
>>>>
>>>>
>>>> On 17 Sep 2014, at 7:20 am, Vallevand, Mark K <mark.vallev...@unisys.com> 
>>>> wrote:
>>>>
>>>>> It looks like there is some odd delay in getting a network interface up 
>>>>> and ready.  So, when cman starts corosync, it can't get to the cluster.  
>>>>> So, for a time, the node is a member of a cluster-of-one.  The 
>>>>> cluster-of-one begins starting resources.
>>>>
>>>> 1. enable two-node mode in cluster.conf (man page should indicate 
>>>> where/how) then disable no-quorum-policy=ignore
>>>> 2. configure fencing
>>>> 3. find a newer version of pacemaker, we're up to .12 now
>>>>
>>>>> A few seconds later, when the interface finally is up and ready, it takes 
>>>>> about 30 more seconds for the cluster-of-one to finally rejoin the larger 
>>>>> cluster.  The doubly-started resources are sorted out and all ends up OK.
>>>>>
>>>>> Now, this is not a good thing to have these particular resources running 
>>>>> twice.  I'd really like the clustering software to behave better.  But, 
>>>>> I'm not sure what 'behave better' would be.
>>>>>
>>>>> Is it possible to introduce a delay into cman or corosync startup?  Is 
>>>>> that even wise?
>>>>> Is there a parameter to get the clustering software to poll more often 
>>>>> when it can't rejoin the cluster?
>>>>>
>>>>> Any suggestions would be welcome.
>>>>>
>>>>> Running Ubuntu 12.04 LTS.  Pacemaker 1.1.6.  Cman 3.1.7.  Corosync 1.4.2.
>>>>>
>>>>> Regards.
>>>>> Mark K Vallevand
>>>>> "If there are no dogs in Heaven, then when I die I want to go where they 
>>>>> went."
>>>>> -Will Rogers
>>>>>
>>>>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
>>>>> MATERIAL and is thus for use only by the intended recipient. If you 
>>>>> received this in error, please contact the sender and delete the e-mail 
>>>>> and its attachments from all computers.
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster@redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster@redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>
>

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Cman (and corosync) starting before network interface is ready

Reply via email to