Hi, It looks like there's an issue with the underlying memcached process. Can you check whether memcached is running on all your Sprout nodes, and check the memcached logs in /var/log/memcached.log and /var/log/syslog?
Also, can you confirm that all your nodes are running Anna Karenina, and that NTP is running on your Sprout nodes? Ellie -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Lianjie Cao Sent: 26 March 2015 15:55 To: [email protected]; Sharma, Puneet; Sonia Fahmy Subject: [Clearwater] Sprout clustering problem Hi, Our current Clearwater deployment has 1 Bono node, 2 Sprout nodes, 2 Homestead nodes and 1 Homer node running Anna Karenina Release. Everything works fine. However, when we are trying to add the third Sprout node, we get error responses during registration test, such as "500 Server Internal Error" and ''401 Unauthorized" for the second register request. To cluster Sprout nodes, we follow the instruction to modify cluster_settings and chronos.conf as below. [sprout]cw@sprout-3:~$ cat /etc/clearwater/cluster_settings servers=192.168.1.21:11211, 192.168.1.22:11211, 192.168.1.23:11211 [sprout]cw@sprout-3:~$ cat /etc/chronos/chronos.conf [http] bind-address = 192.168.1.23 bind-port = 7253 threads = 50 [logging] folder = /var/log/chronos level = 2 [cluster] localhost = 192.168.1.23 node = 192.168.1.21 node = 192.168.1.22 node = 192.168.1.23 [alarms] enabled = true [exceptions] max_ttl = 600 The three nodes share the same files except "bind-address = 192.168.1.23" and "localhost = 192.168.1.23" in chronos.conf. In the Sprout log, we found errors messages like this on all of the 3 Sprout nodes with log level 5 (complete logs are attached): 26-03-2015 07:49:16.717 UTC Debug registrar.cpp:618: REGISTER for public ID sip:[email protected] uses AOR sip:[email protected] 26-03-2015 07:49:16.717 UTC Debug regstore.cpp:120: Get AoR data for sip:[email protected] 26-03-2015 07:49:16.717 UTC Debug memcachedstore.cpp:193: Key reg\\ sip:[email protected] hashes to vbucket 14 via hash 0xe25f38e 26-03-2015 07:49:16.717 UTC Debug memcachedstore.cpp:365: 2 read replicas for key reg\\sip:[email protected] 26-03-2015 07:49:16.717 UTC Debug memcachedstore.cpp:400: Attempt to read from replica 0 (connection 0x7f6ce000ca30) 26-03-2015 07:49:16.718 UTC Debug memcachedstore.cpp:423: Read for reg\\ sip:[email protected] on replica 0 returned error 2 (getaddrinfo() or getnameinfo() HOSTNAME LOOKUP FAILURE) 26-03-2015 07:49:16.718 UTC Debug memcachedstore.cpp:400: Attempt to read from replica 1 (connection 0x7f6ce00129e0) 26-03-2015 07:49:16.720 UTC Debug memcachedstore.cpp:423: Read for reg\\ sip:[email protected] on replica 1 returned error 2 (getaddrinfo() or getnameinfo() HOSTNAME LOOKUP FAILURE) 26-03-2015 07:49:16.720 UTC Error memcachedstore.cpp:512: Failed to read data for reg\\sip:[email protected] from 2 replicas 26-03-2015 07:49:16.720 UTC Debug registrar.cpp:242: Retrieved AoR data (nil) 26-03-2015 07:49:16.720 UTC Error registrar.cpp:249: Failed to get AoR binding for sip:[email protected] from store 26-03-2015 07:49:16.720 UTC Debug pjsip: endpoint Response msg 500/REGISTER/cseq=2 (tdta0x7f6ce0081900) created 26-03-2015 07:49:16.720 UTC Verbose common_sip_processing.cpp:136: TX 449 bytes Response msg 500/REGISTER/cseq=2 (tdta0x7f6ce0081900) to TCP 192.168.1.11:36197: 26-03-2015 07:49:16.933 UTC Debug memcachedstore.cpp:234: Set up new view 1 for thread 26-03-2015 07:49:16.933 UTC Debug memcachedstore.cpp:241: Setting up server 0 for connection 0x7f6d10040e80 (--CONNECT-TIMEOUT=10 --SUPPORT-CAS --POLL-TIMEOUT=250 --BINARY-PROTOCOL) 26-03-2015 07:49:16.933 UTC Debug memcachedstore.cpp:243: Set up connection 0x7f6d10041460 to server 192.168.1.21:11211 26-03-2015 07:49:16.933 UTC Debug memcachedstore.cpp:254: Setting server to IP address 192.168.1.21 port 11211 26-03-2015 07:49:16.933 UTC Debug memcachedstore.cpp:241: Setting up server 1 for connection 0x7f6d10040e80 (--CONNECT-TIMEOUT=10 --SUPPORT-CAS --POLL-TIMEOUT=250 --BINARY-PROTOCOL) 26-03-2015 07:49:16.933 UTC Debug memcachedstore.cpp:243: Set up connection 0x7f6d10045be0 to server 192.168.1.22:11211 26-03-2015 07:49:16.933 UTC Debug memcachedstore.cpp:254: Setting server to IP address 192.168.1.22 port 11211 26-03-2015 07:49:16.936 UTC Debug memcachedstore.cpp:241: Setting up server 2 for connection 0x7f6d10040e80 (--CONNECT-TIMEOUT=10 --SUPPORT-CAS --POLL-TIMEOUT=250 --BINARY-PROTOCOL) 26-03-2015 07:49:16.936 UTC Debug memcachedstore.cpp:243: Set up connection 0x7f6d1004b3c0 to server 192.168.1.23:11211 26-03-2015 07:49:16.936 UTC Debug memcachedstore.cpp:254: Setting server to IP address 192.168.1.23 port 11211 26-03-2015 07:49:16.938 UTC Debug memcachedstore.cpp:560: 2 write replicas for key av\\[email protected]\3a44ea384df9161e 26-03-2015 07:49:16.939 UTC Debug memcachedstore.cpp:612: Attempt conditional write to vbucket 93 on replica 0 (connection 0x7f6d1004b3c0), CAS = 0, expiry = 40 26-03-2015 07:49:16.939 UTC Debug memcachedstore.cpp:812: Attempting to add data for key av\\[email protected]\3a44ea384df9161e 26-03-2015 07:49:16.939 UTC Debug memcachedstore.cpp:822: Attempting memcached ADD command 26-03-2015 07:49:16.942 UTC Debug memcachedstore.cpp:912: ADD/CAS returned rc = 5 (WRITE FAILURE) (140106396906432) getaddrinfo() or getnameinfo() HOSTNAME LOOKUP FAILURE, Name or service not known, host: 192.168.1.23:11211 -> libmemcached/connect.cc:197 26-03-2015 07:49:16.942 UTC Debug memcachedstore.cpp:612: Attempt conditional write to vbucket 93 on replica 1 (connection 0x7f6d10045be0), CAS = 0, expiry = 40 26-03-2015 07:49:16.942 UTC Debug memcachedstore.cpp:812: Attempting to add data for key av\\[email protected]\3a44ea384df9161e 26-03-2015 07:49:16.942 UTC Debug memcachedstore.cpp:822: Attempting memcached ADD command 26-03-2015 07:49:16.945 UTC Debug memcachedstore.cpp:912: ADD/CAS returned rc = 5 (WRITE FAILURE) (140106396883936) getaddrinfo() or getnameinfo() HOSTNAME LOOKUP FAILURE, Name or service not known, host: 192.168.1.22:11211 -> libmemcached/connect.cc:197 26-03-2015 07:49:16.945 UTC Error memcachedstore.cpp:715: Failed to write data for av\\[email protected]\3a44ea384df9161e to 2 replicas 26-03-2015 07:49:16.945 UTC Error avstore.cpp:71: Failed to write Authentication Vector for private_id [email protected] 26-03-2015 07:49:16.945 UTC Verbose common_sip_processing.cpp:136: TX 569 bytes Response msg 401/REGISTER/cseq=1 (tdta0x7f6d10006fa0) to TCP 192.168.1.11:40630: Can you help us on this? Thanks, Lianjie _______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/listinfo/clearwater
