Re: [Linux-cluster] Cman (and corosync) starting before network interface is ready

Andrew Beekhof Thu, 18 Sep 2014 03:30:13 -0700

That doesn't sound much different to no-quorum-policy=ignore 
So I guess it won't help here


Sent from my iPad

> On 18 Sep 2014, at 6:33 pm, Christine Caulfield <ccaul...@redhat.com> wrote:
> 
>> On 18/09/14 09:29, Andrew Beekhof wrote:
>> 
>>> On 18 Sep 2014, at 6:18 pm, Christine Caulfield <ccaul...@redhat.com> wrote:
>>> 
>>>> On 18/09/14 02:35, Andrew Beekhof wrote:
>>>> 
>>>>> On 18 Sep 2014, at 12:34 am, Vallevand, Mark K 
>>>>> <mark.vallev...@unisys.com> wrote:
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 1. I didn't know about two-node mode.  Thanks.  We are testing with two 
>>>>> nodes and "crm configure property no-quorum-policy=ignore".  When one 
>>>>> node goes down, the other node continues clustering.  This is the desired 
>>>>> behavior.  What will <cman two_node="1" expected_votes="1"> </cman> in 
>>>>> cluster.conf do?
>>>> 
>>>> I was all set to be a smart-ass and say 'man cluster.conf', but the joke 
>>>> is on me as my colleagues do not appear to have documented it anywhere.
>>>> Chrissie: Can you elaborate on the details here please?
>>>> 
>>> 
>>> it's documented in the cman(5) man page. The entries in cluster.conf only 
>>> cover the general parts that are not specific to any subsystem. So corosync 
>>> items are documented in the corosync man page and cman ones in the cman man 
>>> page etc.
>> 
>> Ah! Good to know.
>> 
>>        Two node clusters
>>               Ordinarily,  the loss of quorum after one out of two nodes 
>> fails will prevent the remaining node from continuing (if both nodes have 
>> one vote.)  Special configuration options can be set to allow the one 
>> remaining node to continue operating if the other
>>               fails.  To do this only two nodes, each with one vote, can be 
>> defined in cluster.conf.  The two_node and expected_votes values must then 
>> be set to 1 in the cman section as follows.
>> 
>>                 <cman two_node="1" expected_votes="1">
>>                 </cman>
>> 
>> One thing thats not clear to me is what happens when a single node comes up 
>> and can only see itself.
>> Does it get quorum or is it like wait-for-all in corosync2?
>> 
> 
> 
> There's no wait_for_all in cman. The first node up will attempt (after 
> fence_join_delay) the other node in an attempt to stop a split brain.
> 
> This is one of several reasons why we insist that the fencing is on a 
> separate network to heartbeat on a two_node cluster.
> 
> 
> Chrissie
> 
>>> 
>>> Chrissie
>>> 
>>> 
>>>> (Short version, it should do what you want)
>>>> 
>>>>> 2. Yes, fencing is part of our plan, but not at this time.  In the 
>>>>> configurations we are testing, fencing is a RFPITA.
>>>>> 3. We could move up.  We like Ubuntu 12.04 LTS because it is Long Term 
>>>>> Support.  But, we've upgraded packages as necessary.  So, if we move to 
>>>>> the latest stable Pacemaker, Cman and Corosync (and others?), how could 
>>>>> this help?
>>>> 
>>>> Well you might get 3+ years of bug fixes and performance improvements :-)
>>>> 
>>>>> 
>>>>> Is there a way to get the clustering software to 'poll' faster?  I mean, 
>>>>> this NIC stalling at boot time only lasts about 2 seconds beyond the 
>>>>> start of corosync.  But, its 30 more seconds before the nodes see each 
>>>>> other.  I see lots of parameters in the totem directive that seem 
>>>>> interesting.  Would any of them be appropriate.
>>>> 
>>>> Is there not a way to tell upstart not to start the cluster until the 
>>>> network is up?
>>>> 
>>>>> 
>>>>> Andrew: Thanks for the prompt response.
>>>>> 
>>>>> 
>>>>> Regards.
>>>>> Mark K Vallevand
>>>>> 
>>>>> "If there are no dogs in Heaven, then when I die I want to go where they 
>>>>> went."
>>>>> -Will Rogers
>>>>> 
>>>>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
>>>>> MATERIAL and is thus for use only by the intended recipient. If you 
>>>>> received this in error, please contact the sender and delete the e-mail 
>>>>> and its attachments from all computers.
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: linux-cluster-boun...@redhat.com 
>>>>> [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Andrew Beekhof
>>>>> Sent: Tuesday, September 16, 2014 08:51 PM
>>>>> To: linux clustering
>>>>> Subject: Re: [Linux-cluster] Cman (and corosync) starting before network 
>>>>> interface is ready
>>>>> 
>>>>> 
>>>>>> On 17 Sep 2014, at 7:20 am, Vallevand, Mark K 
>>>>>> <mark.vallev...@unisys.com> wrote:
>>>>>> 
>>>>>> It looks like there is some odd delay in getting a network interface up 
>>>>>> and ready.  So, when cman starts corosync, it can't get to the cluster.  
>>>>>> So, for a time, the node is a member of a cluster-of-one.  The 
>>>>>> cluster-of-one begins starting resources.
>>>>> 
>>>>> 1. enable two-node mode in cluster.conf (man page should indicate 
>>>>> where/how) then disable no-quorum-policy=ignore
>>>>> 2. configure fencing
>>>>> 3. find a newer version of pacemaker, we're up to .12 now
>>>>> 
>>>>>> A few seconds later, when the interface finally is up and ready, it 
>>>>>> takes about 30 more seconds for the cluster-of-one to finally rejoin the 
>>>>>> larger cluster.  The doubly-started resources are sorted out and all 
>>>>>> ends up OK.
>>>>>> 
>>>>>> Now, this is not a good thing to have these particular resources running 
>>>>>> twice.  I'd really like the clustering software to behave better.  But, 
>>>>>> I'm not sure what 'behave better' would be.
>>>>>> 
>>>>>> Is it possible to introduce a delay into cman or corosync startup?  Is 
>>>>>> that even wise?
>>>>>> Is there a parameter to get the clustering software to poll more often 
>>>>>> when it can't rejoin the cluster?
>>>>>> 
>>>>>> Any suggestions would be welcome.
>>>>>> 
>>>>>> Running Ubuntu 12.04 LTS.  Pacemaker 1.1.6.  Cman 3.1.7.  Corosync 1.4.2.
>>>>>> 
>>>>>> Regards.
>>>>>> Mark K Vallevand
>>>>>> "If there are no dogs in Heaven, then when I die I want to go where they 
>>>>>> went."
>>>>>> -Will Rogers
>>>>>> 
>>>>>> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
>>>>>> MATERIAL and is thus for use only by the intended recipient. If you 
>>>>>> received this in error, please contact the sender and delete the e-mail 
>>>>>> and its attachments from all computers.
>>>>>> --
>>>>>> Linux-cluster mailing list
>>>>>> Linux-cluster@redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>> 
>>>>> 
>>>>> --
>>>>> Linux-cluster mailing list
>>>>> Linux-cluster@redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>> 
>>> 
>> 
> 

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Cman (and corosync) starting before network interface is ready

Reply via email to