*From: *Steven Dake <[email protected]>
*Sent: * 2013-09-30 18:12:25 E
*To: *Patrick Hemmer <[email protected]>
*CC: *[email protected]
*Subject: *Re: [corosync] Issue starting the CMAP service

> On 09/30/2013 02:43 PM, Patrick Hemmer wrote:
>> *From: *Steven Dake <[email protected]>
>> *Sent: * 2013-09-30 16:50:26 E
>> *To: *Patrick Hemmer <[email protected]>
>> *CC: *[email protected]
>> *Subject: *Re: [corosync] Issue starting the CMAP service
>>
>>> On 09/30/2013 01:45 PM, Patrick Hemmer wrote:
>>>> I'm running corosync 2.3.2 on ubuntu precise. I'm playing with a 3
>>>> node cluster, and whenever I try to start corosync on one of the
>>>> nodes, it fails to start properly.
>>>> I just do a simple start with `corosync -f`, and whenever I try to 
>>>> use any of the tools, they error:
>>>>
>>>> # corosync-cmapctl
>>>> Failed to initialize the cmap API. Error CS_ERR_TRY_AGAIN
>>>> # corosync-quorumtool
>>>> Cannot initialize CMAP service
>>>>
>>>> If I wait long enough (about 9 minutes or 530 seconds), it does end
>>>> up starting, and the tools work, but corosync-quorumtool shows the
>>>> only member is itself.
>>>>
>>>> However if I start corosync with `strace -f corosync -f` the tools
>>>> work fine immediately upon start (though it still doesn't show the
>>>> other nodes). Smells like race condition, but dunno where to begin.
>>>>
>>>>
>>>
>>> My guess is something is wrong with your network relating to
>>> multicast.  Try using udpu mode - it is very stable now and removes
>>> multicast from the list of things that can go wrong.
>>>
>>
>> I am using udpu, see the config :-)
>>
>>
> I assume you have the same config on all nodes?  If so, try using ip
> addresses for the ring id.  possibly a DNS resolution problem?
>
> Other then that, I'm stumped

Yes, exact same config on all nodes. All hosts are present in
/etc/hosts. Also when I do a tcpdump on the other nodes, I see traffic
on port 5405 coming from the node in question.

>
> Regards
> -steve
>
>>> Regards
>>> -steve
>>>
>>>>
>>>> This is the output from `corosync -f` (this node is 10.20.0.212):
>>>> notice  [TOTEM ] Initializing transport (UDP/IP Unicast).
>>>> notice  [TOTEM ] Initializing transmit/receive security (NSS)
>>>> crypto: none hash: none
>>>> notice  [TOTEM ] The network interface [10.20.0.212] is now up.
>>>> notice  [TOTEM ] adding new UDPU member {10.20.0.127}
>>>> notice  [TOTEM ] adding new UDPU member {10.20.0.212}
>>>> notice  [TOTEM ] adding new UDPU member {10.20.2.124}
>>>> notice  [TOTEM ] A new membership (10.20.0.212:1122820) was formed.
>>>> Members joined: 2
>>>> notice  [TOTEM ] A new membership (10.20.0.127:1122824) was formed.
>>>> Members joined: 1 3
>>>> ### here is where it pauses for almost 9 minutes ###
>>>> error   [TOTEM ] FAILED TO RECEIVE
>>>> notice  [TOTEM ] A new membership (10.20.0.212:1122876) was formed.
>>>> Members left: 1 3
>>>> notice  [TOTEM ] A new membership (10.20.0.212:1122936) was formed.
>>>> Members
>>>> notice  [TOTEM ] A new membership (10.20.0.212:1123008) was formed.
>>>> Members
>>>> notice  [TOTEM ] A new membership (10.20.0.212:1123064) was formed.
>>>> Members
>>>> notice  [TOTEM ] A new membership (10.20.0.212:1123124) was formed.
>>>> Members
>>>> notice  [TOTEM ] A new membership (10.20.0.212:1123180) was formed.
>>>> Members
>>>> notice  [TOTEM ] A new membership (10.20.0.212:1123248) was formed.
>>>> Members
>>>> notice  [TOTEM ] A new membership (10.20.0.127:1123256) was formed.
>>>> Members joined: 1 3
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> This is the config (created by `pcs` utility), it's exactly the
>>>> same on all 3 nodes, and the other 2 nodes work fine:
>>>> ----
>>>> totem {
>>>> version: 2
>>>> secauth: off
>>>> cluster_name: hapi-server
>>>> transport: udpu
>>>> }
>>>>
>>>> nodelist {
>>>>   node {
>>>>         ring0_addr: i-74eb9c2f
>>>>         nodeid: 1
>>>>        }
>>>>   node {
>>>>         ring0_addr: i-a3bf0df9
>>>>         nodeid: 2
>>>>        }
>>>>   node {
>>>>         ring0_addr: i-ebcfcbb0
>>>>         nodeid: 3
>>>>        }
>>>> }
>>>>
>>>> quorum {
>>>> provider: corosync_votequorum
>>>> }
>>>>
>>>> logging {
>>>> to_syslog: yes
>>>> }
>>>> ----
>>>>
>>>>
>>>>
>>>> -Patrick
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list
>>>> [email protected]
>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>
>>
>



Here's some additional info from the command line utils after waiting 9
minutes for it to come up:

# corosync-quorumtool
Quorum information
------------------
Date:             Mon Sep 30 22:16:24 2013
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          2
Ring ID:          1124320
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:           

Membership information
----------------------
    Nodeid      Votes Name
         2          1 i-a3bf0df9 (local)


# corosync-cmapctl |grep member
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.20.0.127)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 15
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.20.0.212)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(10.20.2.124)
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 15
runtime.totem.pg.mrp.srp.members.3.status (str) = joined



-Patrick
_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss

Reply via email to