*From: *Steven Dake <[email protected]> *Sent: * 2013-09-30 18:12:25 E *To: *Patrick Hemmer <[email protected]> *CC: *[email protected] *Subject: *Re: [corosync] Issue starting the CMAP service
> On 09/30/2013 02:43 PM, Patrick Hemmer wrote: >> *From: *Steven Dake <[email protected]> >> *Sent: * 2013-09-30 16:50:26 E >> *To: *Patrick Hemmer <[email protected]> >> *CC: *[email protected] >> *Subject: *Re: [corosync] Issue starting the CMAP service >> >>> On 09/30/2013 01:45 PM, Patrick Hemmer wrote: >>>> I'm running corosync 2.3.2 on ubuntu precise. I'm playing with a 3 >>>> node cluster, and whenever I try to start corosync on one of the >>>> nodes, it fails to start properly. >>>> I just do a simple start with `corosync -f`, and whenever I try to >>>> use any of the tools, they error: >>>> >>>> # corosync-cmapctl >>>> Failed to initialize the cmap API. Error CS_ERR_TRY_AGAIN >>>> # corosync-quorumtool >>>> Cannot initialize CMAP service >>>> >>>> If I wait long enough (about 9 minutes or 530 seconds), it does end >>>> up starting, and the tools work, but corosync-quorumtool shows the >>>> only member is itself. >>>> >>>> However if I start corosync with `strace -f corosync -f` the tools >>>> work fine immediately upon start (though it still doesn't show the >>>> other nodes). Smells like race condition, but dunno where to begin. >>>> >>>> >>> >>> My guess is something is wrong with your network relating to >>> multicast. Try using udpu mode - it is very stable now and removes >>> multicast from the list of things that can go wrong. >>> >> >> I am using udpu, see the config :-) >> >> > I assume you have the same config on all nodes? If so, try using ip > addresses for the ring id. possibly a DNS resolution problem? > > Other then that, I'm stumped Yes, exact same config on all nodes. All hosts are present in /etc/hosts. Also when I do a tcpdump on the other nodes, I see traffic on port 5405 coming from the node in question. > > Regards > -steve > >>> Regards >>> -steve >>> >>>> >>>> This is the output from `corosync -f` (this node is 10.20.0.212): >>>> notice [TOTEM ] Initializing transport (UDP/IP Unicast). >>>> notice [TOTEM ] Initializing transmit/receive security (NSS) >>>> crypto: none hash: none >>>> notice [TOTEM ] The network interface [10.20.0.212] is now up. >>>> notice [TOTEM ] adding new UDPU member {10.20.0.127} >>>> notice [TOTEM ] adding new UDPU member {10.20.0.212} >>>> notice [TOTEM ] adding new UDPU member {10.20.2.124} >>>> notice [TOTEM ] A new membership (10.20.0.212:1122820) was formed. >>>> Members joined: 2 >>>> notice [TOTEM ] A new membership (10.20.0.127:1122824) was formed. >>>> Members joined: 1 3 >>>> ### here is where it pauses for almost 9 minutes ### >>>> error [TOTEM ] FAILED TO RECEIVE >>>> notice [TOTEM ] A new membership (10.20.0.212:1122876) was formed. >>>> Members left: 1 3 >>>> notice [TOTEM ] A new membership (10.20.0.212:1122936) was formed. >>>> Members >>>> notice [TOTEM ] A new membership (10.20.0.212:1123008) was formed. >>>> Members >>>> notice [TOTEM ] A new membership (10.20.0.212:1123064) was formed. >>>> Members >>>> notice [TOTEM ] A new membership (10.20.0.212:1123124) was formed. >>>> Members >>>> notice [TOTEM ] A new membership (10.20.0.212:1123180) was formed. >>>> Members >>>> notice [TOTEM ] A new membership (10.20.0.212:1123248) was formed. >>>> Members >>>> notice [TOTEM ] A new membership (10.20.0.127:1123256) was formed. >>>> Members joined: 1 3 >>>> >>>> >>>> >>>> >>>> >>>> This is the config (created by `pcs` utility), it's exactly the >>>> same on all 3 nodes, and the other 2 nodes work fine: >>>> ---- >>>> totem { >>>> version: 2 >>>> secauth: off >>>> cluster_name: hapi-server >>>> transport: udpu >>>> } >>>> >>>> nodelist { >>>> node { >>>> ring0_addr: i-74eb9c2f >>>> nodeid: 1 >>>> } >>>> node { >>>> ring0_addr: i-a3bf0df9 >>>> nodeid: 2 >>>> } >>>> node { >>>> ring0_addr: i-ebcfcbb0 >>>> nodeid: 3 >>>> } >>>> } >>>> >>>> quorum { >>>> provider: corosync_votequorum >>>> } >>>> >>>> logging { >>>> to_syslog: yes >>>> } >>>> ---- >>>> >>>> >>>> >>>> -Patrick >>>> >>>> >>>> _______________________________________________ >>>> discuss mailing list >>>> [email protected] >>>> http://lists.corosync.org/mailman/listinfo/discuss >>> >> > Here's some additional info from the command line utils after waiting 9 minutes for it to come up: # corosync-quorumtool Quorum information ------------------ Date: Mon Sep 30 22:16:24 2013 Quorum provider: corosync_votequorum Nodes: 1 Node ID: 2 Ring ID: 1124320 Quorate: No Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 1 Quorum: 2 Activity blocked Flags: Membership information ---------------------- Nodeid Votes Name 2 1 i-a3bf0df9 (local) # corosync-cmapctl |grep member runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.20.0.127) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 15 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.20.0.212) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(10.20.2.124) runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 15 runtime.totem.pg.mrp.srp.members.3.status (str) = joined -Patrick
_______________________________________________ discuss mailing list [email protected] http://lists.corosync.org/mailman/listinfo/discuss
