On 12/04/16 13:45, Stefano Panella wrote:
> Hi everybody,
> 
> we have been using corosync directly to provide clustering for GFS2 on our 
> centos 7.2 pools with only one network interface and all has been working 
> great so far!
> 
> We now have a new set-up with two network interfaces for every host in the 
> cluster:
> A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X)
> B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X)
> 
> when we run corosync in this mode we get the logs continuously spammed by 
> messages like these:
> 
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 
> 0(consensus timeout).
> [12880] cl15-02 corosyncdebug   [TOTEM ] Creating commit token because I am 
> the rep.
> [12880] cl15-02 corosyncdebug   [TOTEM ] Saving state aru 10 high seq 
> received 10
> [12880] cl15-02 corosyncdebug   [MAIN  ] Storing new sequence id for ring 5750
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering COMMIT state.
> [12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering RECOVERY state.
> [12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [0] member 10.220.88.41:
> [12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [1] member 10.220.88.47:
> [12880] cl15-02 corosyncdebug   [TOTEM ] position [0] member 10.220.88.41:
> [12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 
> 10.220.88.41
> [12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received 
> flag 1
> [12880] cl15-02 corosyncdebug   [TOTEM ] position [1] member 10.220.88.47:
> [12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 
> 10.220.88.41
> [12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received 
> flag 1
> 
> [12880] cl15-02 corosyncdebug   [TOTEM ] Did not need to originate any 
> messages in recovery.
> [12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
> [12880] cl15-02 corosyncdebug   [TOTEM ] Sending initial ORF token
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set 
> retrans flag0 retrans queue empty 1 count 0, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq 
> received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set 
> retrans flag0 retrans queue empty 1 count 1, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq 
> received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set 
> retrans flag0 retrans queue empty 1 count 2, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq 
> received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set 
> retrans flag0 retrans queue empty 1 count 3, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq 
> received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] retrans flag count 4 token aru 0 
> install seq 0 aru 0 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] Resetting old ring state
> [12880] cl15-02 corosyncdebug   [TOTEM ] recovery to regular 1-0
> [12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 1
> Apr 11 16:19:54 [13372] cl15-02 pacemakerd:     info: 
> pcmk_quorum_notification: Membership 22352: quorum retained (2)
> Apr 11 16:19:54 [13378] cl15-02       crmd:     info: 
> pcmk_quorum_notification: Membership 22352: quorum retained (2)
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering OPERATIONAL state.
> [12880] cl15-02 corosyncnotice  [TOTEM ] A new membership 
> (10.220.88.41:22352) was formed. Members
> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for 
> corosync configuration map access
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:    
>   Forwarding cib_modify operation for section nodes to master 
> (origin=local/crmd/27157)
> [12880] cl15-02 corosyncdebug   [CMAP  ] Not first sync -> no action
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:    
>   Forwarding cib_modify operation for section status to master 
> (origin=local/crmd/27158)
> [12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x2
> [12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) 
> ip(10.220.88.41) ; members(old:2 left:0)
> [12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) 
> ip(10.220.88.47) ; members(old:2 left:0)
> [12880] cl15-02 corosyncdebug   [CPG   ] chosen downlist: sender r(0) 
> ip(10.220.88.41) ; members(old:2 left:0)
> [12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x1
> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for 
> corosync cluster closed process group service v1.01
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:    
>   Completed cib_modify operation for section nodes: OK (rc=0, 
> origin=cl15-02/crmd/27157, version=0.18.22)
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[0] group:clvmd, 
> ip:r(0) ip(10.220.88.41) , pid:35677
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:    
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=cl15-02/crmd/27158, version=0.18.22)
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[1] 
> group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[2] 
> group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[3] group:crmd\x00, 
> ip:r(0) ip(10.220.88.41) , pid:13378
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[4] 
> group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[5] 
> group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[6] group:cib\x00, 
> ip:r(0) ip(10.220.88.41) , pid:13373
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[7] 
> group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[8] group:crmd\x00, 
> ip:r(0) ip(10.220.88.47) , pid:12879
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[9] 
> group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[10] 
> group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[11] group:cib\x00, 
> ip:r(0) ip(10.220.88.47) , pid:12874
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[12] 
> group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873
> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA 
> Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No 
> QdeviceMasterWins: No
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster 
> node 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[1]: votes: 1, 
> expected: 3 flags: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA 
> Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No 
> QdeviceMasterWins: No
> [12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster 
> node 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, 
> expected: 0 flags: 0
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster 
> node 2
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[2]: votes: 1, 
> expected: 3 flags: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA 
> Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No 
> QdeviceMasterWins: No
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster 
> node 2
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, 
> expected: 0 flags: 0
> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for 
> corosync vote quorum service v1.0
> [12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
> [12880] cl15-02 corosyncnotice  [QUORUM] Members[2]: 1 2
> [12880] cl15-02 corosyncdebug   [QUORUM] sending quorum notification to 
> (nil), length = 56
> [12880] cl15-02 corosyncnotice  [MAIN  ] Completed service synchronization, 
> ready to provide service.
> [12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 0
> [12880] cl15-02 corosyncdebug   [QUORUM] got quorate request on 0x7f5a907749a0
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 11(merge 
> during join).
> 
> 
> and we do not get them when there is only a single network interface in the 
> systems.
> 
> --------------------------------------------------------------------------------------
> These are the network configurations on the three hosts:
> 
> [root@cl15-02 ~]# ifconfig | grep inet
>         inet 10.220.88.41  netmask 255.255.248.0  broadcast 10.220.95.255
>         inet 10.220.246.50  netmask 255.255.255.0  broadcast 10.220.246.255
>         inet 127.0.0.1  netmask 255.0.0.0
> 
> [root@cl15-08 ~]# ifconfig | grep inet
>         inet 10.220.88.47  netmask 255.255.248.0  broadcast 10.220.95.255
>         inet 10.220.246.51  netmask 255.255.255.0  broadcast 10.220.246.255
>         inet 127.0.0.1  netmask 255.0.0.0
> 
> [root@cl15-09 ~]# ifconfig | grep inet
>         inet 10.220.88.48  netmask 255.255.248.0  broadcast 10.220.95.255
>         inet 10.220.246.59  netmask 255.255.255.0  broadcast 10.220.246.255
>         inet 127.0.0.1  netmask 255.0.0.0
> 
> -----------------------------------------------------------------------------------
> corosync-quorumtool output:
> 
> [root@cl15-02 ~]# corosync-quorumtool
> Quorum information
> ------------------
> Date:             Mon Apr 11 15:46:26 2016
> Quorum provider:  corosync_votequorum
> Nodes:            3
> Node ID:          1
> Ring ID:          18952
> Quorate:          Yes
> 
> Votequorum information
> ----------------------
> Expected votes:   3
> Highest expected: 3
> Total votes:      3
> Quorum:           2
> Flags:            Quorate
> 
> Membership information
> ----------------------
>     Nodeid      Votes Name
>          1          1 cl15-02 (local)
>          2          1 cl15-08
>          3          1 cl15-09
> 
> ---------------------------------------------------------------------------
> /etc/corosync/corosync.conf:
> 
> [root@cl15-02 ~]# cat /etc/corosync/corosync.conf
> totem {
>     version: 2
>     secauth: off
>     cluster_name: gfs_cluster
>     transport: udpu
> }
> 
> nodelist {
>     node {
>         ring0_addr: cl15-02
>         nodeid: 1
>     }
> 
>     node {
>         ring0_addr: cl15-08
>         nodeid: 2
>     }
> 
>     node {
>         ring0_addr: cl15-09
>         nodeid: 3
>     }
> }
> 
> quorum {
>     provider: corosync_votequorum
> }
> 
> logging {
>     debug: on


You have debug logging on. At a guess I would say that the config file
with the other interface in it doesn't :)

Chrissie


>     to_logfile: yes
>     logfile: /var/log/cluster/corosync.log
>     to_syslog: yes
> }
> 

-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to