Ok.
What is your iptables you are using to block traffic?

Please make sure to keep at least lo working, and block BOTH side (so
INPUT and OUTPUT), so something like:
iptables -A INPUT ! -i lo -p udp -j DROP && iptables -A OUTPUT ! -o lo
-p udp -j DROP

works usually well.

Honza

Mark Round napsal(a):
> Same behaviour. I switched to CentOS 6.4 provided Pacemaker,Corosync and 
> CMAN. I configured a CMAN cluster of 4 nodes, and split one node off via 
> iptables DROP.
> 
> It now it looks like this on the 3 nodes in one partition :
> 
> # corosync-quorumtool -s
> Version:          1.4.1
> Nodes:            3
> Ring ID:          1152
> Quorum type:      quorum_cman
> Quorate:          No
> 
> However, on the one victim node, it still thinks it has quorum after I drop 
> everything with iptables :
> 
> # corosync-quorumtool -s
> Version:          1.4.1
> Nodes:            4
> Ring ID:          1148
> Quorum type:      quorum_cman
> Quorate:          Yes
> 
> Note the old ring ID on the victim node. When I allow traffic again, both 
> partitions merge and I get a new Ring ID :
> 
> # corosync-quorumtool -s
> Version:          1.4.1
> Nodes:            4
> Ring ID:          1160
> Quorum type:      quorum_cman
> Quorate:          Yes
> 
> So, it's the same behaviour. One node on it's own cannot seem to decide that 
> it is in a partition on it's own...
> 
> -----Original Message-----
> From: Jan Friesse [mailto:[email protected]]
> Sent: 05 September 2013 13:10
> To: Mark Round; [email protected]
> Subject: Re: [corosync] Corosync quorum not updating on split node
> 
> Mark,
> quorum in 1.4.x have some problems (this may be one of them) and that's why 
> it was completely rewritten in 2.x.
> 
> Can you please try to use cman and it's quorum module? Cman quorum is well 
> tested and should work.
> 
> Regards,
>   Honza
> 
> Mark Round napsal(a):
>> Hi all,
>>
>> I have a problem whereby when I create a network split/partition (by 
>> dropping traffic with iptables), the victim node for some reason does not 
>> realise it has split from the network. If I split a cluster into two 
>> partitions both with multiple nodes, one with quorum and one without, then 
>> things function as expected; it just appears that a single node on it's own 
>> can't work out that it doesn't have quorum if it has no other nodes to talk 
>> to.
>>
>> A single victim node seems to recognise that it can't form a cluster due to 
>> network issues, but the status is not reflected in the output from 
>> corosync-quorumtool, and cluster services (via pacemaker) still continue to 
>> run. However, the other nodes in the rest of the cluster do realise they 
>> have lost contact with a node, no longer have quorum and correctly shut down 
>> services.
>>
>> When I block traffic on the victim node's eth0, The remaining nodes see that 
>> they cannot communicate with it and shutdown :
>>
>> # corosync-quorumtool -s
>> Version:          1.4.5
>> Nodes:            3
>> Ring ID:          696
>> Quorum type:      corosync_votequorum
>> Quorate:          No
>> Node votes:       1
>> Expected votes:   7
>> Highest expected: 7
>> Total votes:      3
>> Quorum:           4 Activity blocked
>> Flags:
>>
>> However, the victim node still thinks everything is fine, and maintains a 
>> view of the cluster prior to the split :
>>
>> # corosync-quorumtool -s
>> Version:          1.4.5
>> Nodes:            4
>> Ring ID:          716
>> Quorum type:      corosync_votequorum
>> Quorate:          Yes
>> Node votes:       1
>> Expected votes:   7
>> Highest expected: 7
>> Total votes:      4
>> Quorum:           4
>> Flags:            Quorate
>>
>> However, it does notice in the logs that it cannot now form cluster, as the 
>> following messages repeat constantly :
>>
>> corosync [MAIN  ] Totem is unable to form a cluster because of an operating 
>> system or network fault. The most common cause of this message is that the 
>> local firewall is configured improperly.
>>
>> I would expect at this point for it to be in it's own network partition with 
>> a total of 1 vote, and block activity. However, this does not seem to happen 
>> until just after it rejoins the cluster. When I unblock traffic and it 
>> rejoins, I see the victim finally realise it had lost quorum :
>>
>> Sep 05 09:52:21 corosync [pcmk  ] notice: pcmk_peer_update:
>> Transitional membership event on ring 720: memb=1, new=0, lost=3 Sep
>> 05 09:52:21 corosync [VOTEQ ] quorum lost, blocking activity Sep 05 09:52:21 
>> corosync [QUORUM] This node is within the non-primary component and will NOT 
>> provide any services.
>> Sep 05 09:52:21 corosync [QUORUM] Members[1]: 358898186
>>
>> And a second or so later it regains quorum :
>>
>> crmd:   notice: ais_dispatch_message:         Membership 736: quorum acquired
>>
>> So my question is why, when it realises it cannot form a cluster ("Totem in 
>> unable to form..."), does it not loose quorum, update the status as 
>> reflected by quorumtool and shutdown cluster services ?
>>
>> Configuration file example and package versions/environment listed below. 
>> I'm using "updu" protocol as we need to avoid multicast in this environment; 
>> it will eventually be using a routed network. This behaviour also persists 
>> when I disable the pacemaker plugin and just test with corosync.
>>
>> compatibility: whitetank
>> totem {
>>     version: 2
>>     secauth: off
>>     interface {
>>         member {
>>             memberaddr: 10.90.100.20
>>         }
>>         member {
>>             memberaddr: 10.90.100.21
>>         }
>> ...
>> ... more nodes snipped
>> ...
>> ringnumber: 0
>>         bindnetaddr: 10.90.100.20
>>         mcastport: 5405
>>     }
>>     transport: udpu
>> }
>> amf {
>>         mode: disabled
>> }
>> aisexec {
>>         user: root
>>         group: root
>> }
>> quorum {
>>         provider: corosync_votequorum
>>         expected_votes: 7
>> }
>> service {
>>         # Load the Pacemaker Cluster Resource Manager
>>         name: pacemaker
>>         ver: 0
>> }
>>
>> Environment : CentOS 6.4
>> Packages from OpenSUSE :
>> http://download.opensuse.org/repositories/network:/ha-clustering:/Stab
>> le/RedHat_RHEL-6/x86_64/ # rpm -qa | egrep
>> "^(cluster|corosync|crm|libqb|pacemaker|resource-agents)" | sort
>> cluster-glue-1.0.11-3.1.x86_64
>> cluster-glue-libs-1.0.11-3.1.x86_64
>> corosync-1.4.5-2.2.x86_64
>> corosynclib-1.4.5-2.2.x86_64
>> crmsh-1.2.6-0.rc3.3.1.x86_64
>> libqb0-0.14.4-1.2.x86_64
>> pacemaker-1.1.9-2.1.x86_64
>> pacemaker-cli-1.1.9-2.1.x86_64
>> pacemaker-cluster-libs-1.1.9-2.1.x86_64
>> pacemaker-libs-1.1.9-2.1.x86_64
>> resource-agents-3.9.5-3.1.x86_64
>>
>> Regards,
>>
>> -Mark
>>
>> ________________________________
>>
>> Mark Round
>> Senior Systems Administrator
>> NCC Group
>> Kings Court
>> Kingston Road
>> Leatherhead, KT22 7SL
>>
>> Telephone: +44 1372 383815
>> Mobile: +44 7790 770413
>> Fax:
>> Website: www.nccgroup.com<http://www.nccgroup.com>
>> Email:  [email protected]<mailto:[email protected]>
>>         [http://www.nccgroup.com/media/192418/nccgrouplogo.jpg]
>> <http://www.nccgroup.com/> ________________________________
>>
>> This email is sent for and on behalf of NCC Group. NCC Group is the trading 
>> name of NCC Group Performance Testing Limited (Registered in England CRN: 
>> 4069379). Registered Office: Manchester Technology Centre, Oxford Road, 
>> Manchester, M1 7EF. The ultimate holding company is NCC Group plc 
>> (Registered in England CRN: 4627044).
>>
>> Confidentiality: This e-mail contains proprietary information, some or all 
>> of which may be confidential and/or legally privileged. It is for the 
>> intended recipient only. If an addressing or transmission error has 
>> misdirected this e-mail, please notify the author by replying to this e-mail 
>> and then delete the original. If you are not the intended recipient you may 
>> not use, disclose, distribute, copy, print or rely on any information 
>> contained in this e-mail. You must not inform any other person other than 
>> NCC Group or the sender of its existence.
>>
>> For more information about NCC Group please visit
>> www.nccgroup.com<http://www.nccgroup.com>
>>
>> P Before you print think about the ENVIRONMENT
>>
>>
>> For more information please visit <a
>> href="http://www.mimecast.com";>http://www.mimecast.com<br>
>> This email message has been delivered safely and archived online by Mimecast.
>> </a>
>>
>>
>> _______________________________________________
>> discuss mailing list
>> [email protected]
>> http://lists.corosync.org/mailman/listinfo/discuss
>>
> 
> 
> ________________________________
> 
> Mark Round
> Senior Systems Administrator
> NCC Group
> Kings Court
> Kingston Road
> Leatherhead, KT22 7SL
> 
> Telephone: +44 1372 383815
> Mobile: +44 7790 770413
> Fax:
> Website: www.nccgroup.com<http://www.nccgroup.com>
> Email:  [email protected]<mailto:[email protected]>
>         [http://www.nccgroup.com/media/192418/nccgrouplogo.jpg] 
> <http://www.nccgroup.com/>
> ________________________________
> 
> This email is sent for and on behalf of NCC Group. NCC Group is the trading 
> name of NCC Group Performance Testing Limited (Registered in England CRN: 
> 4069379). Registered Office: Manchester Technology Centre, Oxford Road, 
> Manchester, M1 7EF. The ultimate holding company is NCC Group plc (Registered 
> in England CRN: 4627044).
> 
> Confidentiality: This e-mail contains proprietary information, some or all of 
> which may be confidential and/or legally privileged. It is for the intended 
> recipient only. If an addressing or transmission error has misdirected this 
> e-mail, please notify the author by replying to this e-mail and then delete 
> the original. If you are not the intended recipient you may not use, 
> disclose, distribute, copy, print or rely on any information contained in 
> this e-mail. You must not inform any other person other than NCC Group or the 
> sender of its existence.
> 
> For more information about NCC Group please visit 
> www.nccgroup.com<http://www.nccgroup.com>
> 
> P Before you print think about the ENVIRONMENT
> 
> 
> For more information please visit <a 
> href="http://www.mimecast.com";>http://www.mimecast.com<br>
> This email message has been delivered safely and archived online by Mimecast.
> </a>
> 
> 
> _______________________________________________
> discuss mailing list
> [email protected]
> http://lists.corosync.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss

Reply via email to