Hi everybody, we have been using corosync directly to provide clustering for GFS2 on our centos 7.2 pools with only one network interface and all has been working great so far!
We now have a new set-up with two network interfaces for every host in the cluster: A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X) B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X) when we run corosync in this mode we get the logs continuously spammed by messages like these: [12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 0(consensus timeout). [12880] cl15-02 corosyncdebug [TOTEM ] Creating commit token because I am the rep. [12880] cl15-02 corosyncdebug [TOTEM ] Saving state aru 10 high seq received 10 [12880] cl15-02 corosyncdebug [MAIN ] Storing new sequence id for ring 5750 [12880] cl15-02 corosyncdebug [TOTEM ] entering COMMIT state. [12880] cl15-02 corosyncdebug [TOTEM ] got commit token [12880] cl15-02 corosyncdebug [TOTEM ] entering RECOVERY state. [12880] cl15-02 corosyncdebug [TOTEM ] TRANS [0] member 10.220.88.41: [12880] cl15-02 corosyncdebug [TOTEM ] TRANS [1] member 10.220.88.47: [12880] cl15-02 corosyncdebug [TOTEM ] position [0] member 10.220.88.41: [12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41 [12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag 1 [12880] cl15-02 corosyncdebug [TOTEM ] position [1] member 10.220.88.47: [12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41 [12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag 1 [12880] cl15-02 corosyncdebug [TOTEM ] Did not need to originate any messages in recovery. [12880] cl15-02 corosyncdebug [TOTEM ] got commit token [12880] cl15-02 corosyncdebug [TOTEM ] Sending initial ORF token [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0 [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 [12880] cl15-02 corosyncdebug [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 [12880] cl15-02 corosyncdebug [TOTEM ] Resetting old ring state [12880] cl15-02 corosyncdebug [TOTEM ] recovery to regular 1-0 [12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 1 Apr 11 16:19:54 [13372] cl15-02 pacemakerd: info: pcmk_quorum_notification: Membership 22352: quorum retained (2) Apr 11 16:19:54 [13378] cl15-02 crmd: info: pcmk_quorum_notification: Membership 22352: quorum retained (2) [12880] cl15-02 corosyncdebug [TOTEM ] entering OPERATIONAL state. [12880] cl15-02 corosyncnotice [TOTEM ] A new membership (10.220.88.41:22352) was formed. Members [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync configuration map access Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Forwarding cib_modify operation for section nodes to master (origin=local/crmd/27157) [12880] cl15-02 corosyncdebug [CMAP ] Not first sync -> no action Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/27158) [12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x2 [12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0) ip(10.220.88.41) ; members(old:2 left:0) [12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0) ip(10.220.88.47) ; members(old:2 left:0) [12880] cl15-02 corosyncdebug [CPG ] chosen downlist: sender r(0) ip(10.220.88.41) ; members(old:2 left:0) [12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x1 [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync cluster closed process group service v1.01 Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=cl15-02/crmd/27157, version=0.18.22) [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[0] group:clvmd, ip:r(0) ip(10.220.88.41) , pid:35677 Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=cl15-02/crmd/27158, version=0.18.22) [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[1] group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[2] group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[3] group:crmd\x00, ip:r(0) ip(10.220.88.41) , pid:13378 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[4] group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[5] group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[6] group:cib\x00, ip:r(0) ip(10.220.88.41) , pid:13373 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[7] group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[8] group:crmd\x00, ip:r(0) ip(10.220.88.47) , pid:12879 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[9] group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[10] group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[11] group:cib\x00, ip:r(0) ip(10.220.88.47) , pid:12874 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[12] group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873 [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 1 [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 3 flags: 1 [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No [12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1 [12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1 [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 1 [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0 [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 2 [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 1 [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 2 [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0 [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync vote quorum service v1.0 [12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1 [12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1 [12880] cl15-02 corosyncnotice [QUORUM] Members[2]: 1 2 [12880] cl15-02 corosyncdebug [QUORUM] sending quorum notification to (nil), length = 56 [12880] cl15-02 corosyncnotice [MAIN ] Completed service synchronization, ready to provide service. [12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 0 [12880] cl15-02 corosyncdebug [QUORUM] got quorate request on 0x7f5a907749a0 [12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 11(merge during join). and we do not get them when there is only a single network interface in the systems. -------------------------------------------------------------------------------------- These are the network configurations on the three hosts: [root@cl15-02 ~]# ifconfig | grep inet inet 10.220.88.41 netmask 255.255.248.0 broadcast 10.220.95.255 inet 10.220.246.50 netmask 255.255.255.0 broadcast 10.220.246.255 inet 127.0.0.1 netmask 255.0.0.0 [root@cl15-08 ~]# ifconfig | grep inet inet 10.220.88.47 netmask 255.255.248.0 broadcast 10.220.95.255 inet 10.220.246.51 netmask 255.255.255.0 broadcast 10.220.246.255 inet 127.0.0.1 netmask 255.0.0.0 [root@cl15-09 ~]# ifconfig | grep inet inet 10.220.88.48 netmask 255.255.248.0 broadcast 10.220.95.255 inet 10.220.246.59 netmask 255.255.255.0 broadcast 10.220.246.255 inet 127.0.0.1 netmask 255.0.0.0 ----------------------------------------------------------------------------------- corosync-quorumtool output: [root@cl15-02 ~]# corosync-quorumtool Quorum information ------------------ Date: Mon Apr 11 15:46:26 2016 Quorum provider: corosync_votequorum Nodes: 3 Node ID: 1 Ring ID: 18952 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 1 1 cl15-02 (local) 2 1 cl15-08 3 1 cl15-09 --------------------------------------------------------------------------- /etc/corosync/corosync.conf: [root@cl15-02 ~]# cat /etc/corosync/corosync.conf totem { version: 2 secauth: off cluster_name: gfs_cluster transport: udpu } nodelist { node { ring0_addr: cl15-02 nodeid: 1 } node { ring0_addr: cl15-08 nodeid: 2 } node { ring0_addr: cl15-09 nodeid: 3 } } quorum { provider: corosync_votequorum } logging { debug: on to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes } -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster