I'm trying to get corosync running inside 2 docker containers. One of
them is spewing out lots of "The token was lost in the COMMIT state."
messages. The other is simply logging "The consensus timeout expired."
(which given the state of the other node, is expected).

Googling the commit state message turns up almost nothing, so I have no
clue what it means.

Both nodes are inside docker containers which each get NATed before
leaving the server (using UDPU). I've taken this into consideration and
have manually set the nodeid for each so that it's not based off the IP
address.
tcpdump shows me that both nodes are receiving traffic from the other
node. However the node which is throwing the 'lost in commit state' is
only sending a packet every few seconds, where as the 'consensus
timeout' node is sending a ton of packets.


Node 1:
------------
Name: i-cd3b0393
Container IP: 172.17.0.21 (the IP corosync binds to)
Server IP: 10.20.27.52
Version: 2.3.3 (Fedora 20)


corosync.conf:
    totem {
      version: 2
      token: 2000
      token_retransmits_before_loss_const: 10
      vsftype: none
      secauth: off
      transport: udpu
    }

    logging {
      fileline: off
      syslog_facility: local2
      syslog_priority: debug
    }

    quorum {
      provider: corosync_votequorum
    }

    nodelist {
      node {
        nodeid: 1862911301
        ring0_addr: i-a2542ffc
      }
      node {
        nodeid: 2585129852
        ring0_addr: i-cd3b0393
      }
    }


/etc/hosts:
    172.17.0.21    i-cd3b0393
    10.20.50.204 i-a2542ffc


logs:
    Aug 29 02:53:17 i-cd3b0393 local2.info corosync[318]:  [TOTEM ] The
consensus timeout expired.
    Aug 29 02:53:17 i-cd3b0393 local2.info corosync[318]:  [TOTEM ]
entering GATHER state from 3(The consensus timeout expired.).
    Aug 29 02:53:18 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
    Aug 29 02:53:19 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
    Aug 29 02:53:21 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
    Aug 29 02:53:21 i-cd3b0393 local2.info corosync[318]:  [TOTEM ] The
consensus timeout expired.
    Aug 29 02:53:21 i-cd3b0393 local2.info corosync[318]:  [TOTEM ]
entering GATHER state from 3(The consensus timeout expired.).
    Aug 29 02:53:22 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
    Aug 29 02:53:24 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
    Aug 29 02:53:25 i-cd3b0393 local2.warn corosync[318]:  [MAIN  ]
Totem is unable to form a cluster because of an operating system or
network fault. The most common cause of this message is that the local
firewall is configured improperly.
    Aug 29 02:53:26 i-cd3b0393 local2.info corosync[318]:  [TOTEM ] The
consensus timeout expired.
    Aug 29 02:53:26 i-cd3b0393 local2.info corosync[318]:  [TOTEM ]
entering GATHER state from 3(The consensus timeout expired.).


tcpdump:
    03:03:58.846382 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP,
length 163
    03:03:58.896435 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP,
length 163
    03:03:58.945786 IP 10.20.50.204.37971 > 172.17.0.21.5405: UDP,
length 163
    03:03:58.946487 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP,
length 163
    03:03:58.996544 IP 172.17.0.21.57910 > 10.20.50.204.5405: UDP,
length 163


corosync-quorumtool:
    Quorum information
    ------------------
    Date:             Fri Aug 29 02:57:45 2014
    Quorum provider:  corosync_votequorum
    Nodes:            1
    Node ID:          2585129852
    Ring ID:          2904
    Quorate:          No

    Votequorum information
    ----------------------
    Expected votes:   2
    Highest expected: 2
    Total votes:      1
    Quorum:           2 Activity blocked
    Flags:            

    Membership information
    ----------------------
        Nodeid      Votes Name
    2585129852          1 i-cd3b0393 (local)


========================================

Node 2:
------------
Name: i-a2542ffc
Container IP: 172.17.0.7 (the IP corosync binds to)
Server IP: 10.20.50.204
Version: 2.3.3 (Fedora 20)


corosync.conf:
    totem {
      version: 2
      token: 2000
      token_retransmits_before_loss_const: 10
      vsftype: none
      secauth: off
      transport: udpu
    }

    logging {
      fileline: off
      syslog_facility: local2
      syslog_priority: debug
    }

    quorum {
      provider: corosync_votequorum
    }

    nodelist {
      node {
        nodeid: 1862911301
        ring0_addr: i-a2542ffc
      }
      node {
        nodeid: 2585129852
        ring0_addr: i-cd3b0393
      }
    }


/etc/hosts:
    172.17.0.7    i-a2542ffc
    10.20.27.52 i-cd3b0393


logs:
    Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]:  [TOTEM ] The
token was lost in the COMMIT state.
    Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
entering GATHER state from 4(The token was lost in the COMMIT state.).
    Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
Creating commit token because I am the rep.
    Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
Storing new sequence id for ring 1b88
    Aug 29 02:53:03 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
entering COMMIT state.
    Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]:  [TOTEM ] The
token was lost in the COMMIT state.
    Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
entering GATHER state from 4(The token was lost in the COMMIT state.).
    Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
Creating commit token because I am the rep.
    Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
Storing new sequence id for ring 1b8c
    Aug 29 02:53:05 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
entering COMMIT state.
    Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]:  [TOTEM ] The
token was lost in the COMMIT state.
    Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
entering GATHER state from 4(The token was lost in the COMMIT state.).
    Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
Creating commit token because I am the rep.
    Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
Storing new sequence id for ring 1b90
    Aug 29 02:53:07 i-a2542ffc local2.info corosync[279]:  [TOTEM ]
entering COMMIT state.


tcpdump:
    03:04:25.137038 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163
    03:04:25.187086 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163
    03:04:25.235829 IP 172.17.0.7.37971 > 10.20.27.52.5405: UDP, length 163
    03:04:25.237123 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163
    03:04:25.287847 IP 10.20.27.52.57910 > 172.17.0.7.5405: UDP, length 163


corosync-quorumtool:
    Quorum information
    ------------------
    Date:             Fri Aug 29 02:57:19 2014
    Quorum provider:  corosync_votequorum
    Nodes:            1
    Node ID:          1862911301
    Ring ID:          4488
    Quorate:          No

    Votequorum information
    ----------------------
    Expected votes:   2
    Highest expected: 2
    Total votes:      1
    Quorum:           2 Activity blocked
    Flags:            

    Membership information
    ----------------------
        Nodeid      Votes Name
    1862911301          1 i-a2542ffc (local)

_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss

Reply via email to