I'm trying to get corosync running inside 2 docker containers. One of
them is spewing out lots of "The token was lost in the COMMIT state."
messages. The other is simply logging "The consensus timeout expired."
(which given the state of the other node, is expected).
Googling the commit state message turns up almost nothing, so I have no
clue what it means.
Both nodes are inside docker containers which each get NATed before
leaving the server (using UDPU). I've taken this into consideration and
have manually set the nodeid for each so that it's not based off the IP
address.
tcpdump shows me that both nodes are receiving traffic from the other
node. However the node which is throwing the 'lost in commit state' is
only sending a packet every few seconds, where as the 'consensus
timeout' node is sending a ton of packets.
Node 1:
------------
Name: i-cd3b0393
Container IP: 172.17.0.21 (the IP corosync binds to)
Server IP: 10.20.27.52
Version: 2.3.3 (Fedora 20)
corosync.conf:
totem {
version: 2
token: 2000
token_retransmits_before_loss_const: 10
vsftype: none
secauth: off
transport: udpu
}
logging {
fileline: off
syslog_facility: local2
syslog_priority: debug
}
quorum {
provider: corosync_votequorum
}
nodelist {
node {
nodeid: 1862911301
ring0_addr: i-a2542ffc
}
node {
nodeid: 2585129852
ring0_addr: i-cd3b0393
}
}
/etc/hosts:
172.17.0.21 i-cd3b0393
10.20.50.204 i-a2542ffc
logs:
Aug 29 02:53:17 i-cd3b0393 local2.info corosync[318]: [TOTEM ] The
consensus timeout expired.
Aug 29 02:53:17 i-cd3b0393 local2.info corosync[318]: [TOTEM ]
entering GATHER state from 3(The consensus timeout expired.).
Aug 29 02:53:18 i-cd3b0393 local2.warn corosync[318]: [MAIN ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
Aug 29 02:53:19 i-cd3b0393 local2.warn corosync[318]: [MAIN ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
Aug 29 02:53:21 i-cd3b0393 local2.warn corosync[318]: [MAIN ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
Aug 29 02:53:21 i-cd3b0393 local2.info corosync[318]: [TOTEM ] The
consensus timeout expired.
Aug 29 02:53:21 i-cd3b0393 local2.info corosync[318]: [TOTEM ]
entering GATHER state from 3(The consensus timeout expired.).
Aug 29 02:53:22 i-cd3b0393 local2.warn corosync[318]: [MAIN ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
Aug 29 02:53:24 i-cd3b0393 local2.warn corosync[318]: [MAIN ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
Aug 29 02:53:25 i-cd3b0393 local2.warn corosync[318]: [MAIN ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
Aug 29 02:53:26 i-cd3b0393 local2.info corosync[318]: [TOTEM ] The
consensus timeout expired.
Aug 29 02:53:26 i-cd3b0393 local2.info corosync[318]: [TOTEM ]
entering GATHER state from 3(The consensus timeout expired.).
tcpdump:
03:03:58.846382 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP,
length 163
03:03:58.896435 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP,
length 163
03:03:58.945786 IP 10.20.50.204.37971 > 172.17.0.21.5405: UDP,
length 163
03:03:58.946487 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP,
length 163
03:03:58.996544 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP,
length 163
corosync-quorumtool:
Quorum information
------------------
Date: Fri Aug 29 02:57:45 2014
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 2585129852
Ring ID: 2904
Quorate: No
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
2585129852 1 i-cd3b0393 (local)
========================================
Node 2:
------------
Name: i-a2542ffc
Container IP: 172.17.0.7 (the IP corosync binds to)
Server IP: 10.20.50.204
Version: 2.3.3 (Fedora 20)
corosync.conf:
totem {
version: 2
token: 2000
token_retransmits_before_loss_const: 10
vsftype: none
secauth: off
transport: udpu
}
logging {
fileline: off
syslog_facility: local2
syslog_priority: debug
}
quorum {
provider: corosync_votequorum
}
nodelist {
node {
nodeid: 1862911301
ring0_addr: i-a2542ffc
}
node {
nodeid: 2585129852
ring0_addr: i-cd3b0393
}
}
/etc/hosts:
172.17.0.7 i-a2542ffc
10.20.27.52 i-cd3b0393
logs:
Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]: [TOTEM ] The
token was lost in the COMMIT state.
Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]: [TOTEM ]
entering GATHER state from 4(The token was lost in the COMMIT state.).
Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]: [TOTEM ]
Creating commit token because I am the rep.
Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]: [TOTEM ]
Storing new sequence id for ring 1b88
Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]: [TOTEM ]
entering COMMIT state.
Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]: [TOTEM ] The
token was lost in the COMMIT state.
Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]: [TOTEM ]
entering GATHER state from 4(The token was lost in the COMMIT state.).
Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]: [TOTEM ]
Creating commit token because I am the rep.
Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]: [TOTEM ]
Storing new sequence id for ring 1b8c
Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]: [TOTEM ]
entering COMMIT state.
Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]: [TOTEM ] The
token was lost in the COMMIT state.
Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]: [TOTEM ]
entering GATHER state from 4(The token was lost in the COMMIT state.).
Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]: [TOTEM ]
Creating commit token because I am the rep.
Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]: [TOTEM ]
Storing new sequence id for ring 1b90
Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]: [TOTEM ]
entering COMMIT state.
tcpdump:
03:04:25.137038 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163
03:04:25.187086 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163
03:04:25.235829 IP 172.17.0.7.37971 > 10.20.27.52.5405: UDP, length 163
03:04:25.237123 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163
03:04:25.287847 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163
corosync-quorumtool:
Quorum information
------------------
Date: Fri Aug 29 02:57:19 2014
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 1862911301
Ring ID: 4488
Quorate: No
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
1862911301 1 i-a2542ffc (local)
_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss