Re: [Linux-cluster] Cman (and corosync) starting before network interface is ready

Christine Caulfield Thu, 18 Sep 2014 01:39:07 -0700

On 18/09/14 09:29, Andrew Beekhof wrote:


On 18 Sep 2014, at 6:18 pm, Christine Caulfield <[email protected]> wrote:

On 18/09/14 02:35, Andrew Beekhof wrote:


On 18 Sep 2014, at 12:34 am, Vallevand, Mark K <[email protected]> 
wrote:

Thanks.

1. I didn't know about two-node mode.  Thanks.  We are testing with two nodes and "crm configure property 
no-quorum-policy=ignore".  When one node goes down, the other node continues clustering.  This is the desired 
behavior.  What will <cman two_node="1" expected_votes="1"> </cman> in cluster.conf do?


I was all set to be a smart-ass and say 'man cluster.conf', but the joke is on 
me as my colleagues do not appear to have documented it anywhere.
Chrissie: Can you elaborate on the details here please?


it's documented in the cman(5) man page. The entries in cluster.conf only cover 
the general parts that are not specific to any subsystem. So corosync items are 
documented in the corosync man page and cman ones in the cman man page etc.


Ah! Good to know.

        Two node clusters
               Ordinarily,  the loss of quorum after one out of two nodes fails 
will prevent the remaining node from continuing (if both nodes have one vote.)  
Special configuration options can be set to allow the one remaining node to 
continue operating if the other
               fails.  To do this only two nodes, each with one vote, can be 
defined in cluster.conf.  The two_node and expected_votes values must then be 
set to 1 in the cman section as follows.

                 <cman two_node="1" expected_votes="1">
                 </cman>

One thing thats not clear to me is what happens when a single node comes up and 
can only see itself.
Does it get quorum or is it like wait-for-all in corosync2?

There's no wait_for_all in cman. The first node up will attempt (afterfence_join_delay) the other node in an attempt to stop a split brain.

This is one of several reasons why we insist that the fencing is on aseparate network to heartbeat on a two_node cluster.



Chrissie


Chrissie

(Short version, it should do what you want)

2. Yes, fencing is part of our plan, but not at this time.  In the 
configurations we are testing, fencing is a RFPITA.
3. We could move up.  We like Ubuntu 12.04 LTS because it is Long Term Support. 
 But, we've upgraded packages as necessary.  So, if we move to the latest 
stable Pacemaker, Cman and Corosync (and others?), how could this help?


Well you might get 3+ years of bug fixes and performance improvements :-)


Is there a way to get the clustering software to 'poll' faster?  I mean, this 
NIC stalling at boot time only lasts about 2 seconds beyond the start of 
corosync.  But, its 30 more seconds before the nodes see each other.  I see 
lots of parameters in the totem directive that seem interesting.  Would any of 
them be appropriate.


Is there not a way to tell upstart not to start the cluster until the network 
is up?


Andrew: Thanks for the prompt response.


Regards.
Mark K Vallevand

"If there are no dogs in Heaven, then when I die I want to go where they went."
-Will Rogers

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Andrew Beekhof
Sent: Tuesday, September 16, 2014 08:51 PM
To: linux clustering
Subject: Re: [Linux-cluster] Cman (and corosync) starting before network 
interface is ready


On 17 Sep 2014, at 7:20 am, Vallevand, Mark K <[email protected]> wrote:

It looks like there is some odd delay in getting a network interface up and 
ready.  So, when cman starts corosync, it can't get to the cluster.  So, for a 
time, the node is a member of a cluster-of-one.  The cluster-of-one begins 
starting resources.


1. enable two-node mode in cluster.conf (man page should indicate where/how) 
then disable no-quorum-policy=ignore
2. configure fencing
3. find a newer version of pacemaker, we're up to .12 now

A few seconds later, when the interface finally is up and ready, it takes about 
30 more seconds for the cluster-of-one to finally rejoin the larger cluster.  
The doubly-started resources are sorted out and all ends up OK.

Now, this is not a good thing to have these particular resources running twice. 
 I'd really like the clustering software to behave better.  But, I'm not sure 
what 'behave better' would be.

Is it possible to introduce a delay into cman or corosync startup?  Is that 
even wise?
Is there a parameter to get the clustering software to poll more often when it 
can't rejoin the cluster?

Any suggestions would be welcome.

Running Ubuntu 12.04 LTS.  Pacemaker 1.1.6.  Cman 3.1.7.  Corosync 1.4.2.

Regards.
Mark K Vallevand
"If there are no dogs in Heaven, then when I die I want to go where they went."
-Will Rogers

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Cman (and corosync) starting before network interface is ready

Reply via email to