Asra, I'm replying on Chris' behalf because he's a bit busy at the moment.
Sorry that those links didn't help you - please can you be a bit more specific on what questions you're looking to answer about redundancy, load balancing and scalability? Clearwater operates in an active/active cluster fault-tolerance model, so in your example the first and second Sprout nodes should be sharing load while they're both active, and Bono should route traffic just to the second Sprout node if the first Sprout node is unavailable for any reason. The behavior you're seeing looks odd. Please can you · confirm the topology you're running with - is it 1 Bono, 2 Sprouts, 1 Homestead, or something different? · confirm the IP addresses of the first and second sprouts, and which is available - below you said that the first Sprout was 10.0.0.4, but the "DNS Scale.txt" file you shared showed that the first Sprout was 10.0.0.3 and that the second was 10.0.0.4? · check the Bono logs for DNS resolution, to see if we are getting both Sprout IP addresses returned by the DNS server - for example, on one of my test servers, I see logs of the following form: 01-06-2016 14:00:11.204 UTC Status connection_pool.cpp:447: Recycle TCP connection slot 39 01-06-2016 14:00:11.204 UTC Debug sipresolver.cpp:86: SIPResolver::resolve for name icscf.sprout.staging.cw-ngv.com, port 0, transport 6, family 2 01-06-2016 14:00:11.204 UTC Debug baseresolver.cpp:514: Attempt to parse icscf.sprout.staging.cw-ngv.com as IP address 01-06-2016 14:00:11.204 UTC Debug dnscachedresolver.cpp:724: Removing record for ip-10-0-0-132.ec2.internal (type 1, expiry time 1464789591) from the expiry list 01-06-2016 14:00:11.204 UTC Debug dnscachedresolver.cpp:724: Removing record for _sip._tcp.icscf.sprout.staging.cw-ngv.com (type 33, expiry time 1464789610) from the expiry list 01-06-2016 14:00:11.204 UTC Verbose dnscachedresolver.cpp:245: Check cache for _sip._tcp.icscf.sprout.staging.cw-ngv.com type 33 01-06-2016 14:00:11.204 UTC Debug dnscachedresolver.cpp:278: Expired entry found in cache - starting asynchronous query to update it 01-06-2016 14:00:11.204 UTC Debug dnscachedresolver.cpp:302: Create and execute DNS query transaction 01-06-2016 14:00:11.204 UTC Debug dnscachedresolver.cpp:315: Wait for query responses 01-06-2016 14:00:11.209 UTC Debug dnscachedresolver.cpp:465: Received DNS response for _sip._tcp.icscf.sprout.staging.cw-ngv.com type SRV 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:90: Parsing DNS message 000000: 381f8180 00010001 00000000 045f7369 70045f74 63700569 63736366 06737072 8... .... .... ._si p._t cp.i cscf .spr 000020: 6f757407 73746167 696e6706 63772d6e 67760363 6f6d0000 210001c0 0c002100 out. stag ing. cw-n gv.c om.. !... ..!. 000040: 01000000 3c002200 01000113 bc0d6970 2d31302d 302d302d 31333203 65633208 .... <.". .... ..ip -10- 0-0- 132. ec2. 000060: 696e7465 726e616c 00 inte rnal . 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:95: Parsing header at offset 0x0 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:98: 1 questions, 1 answers, 0 authorities, 0 additional records 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:103: Parsing question 1 at offset 0xc 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:229: Parsed domain name = _sip._tcp.icscf.sprout.staging.cw-ngv.com, encoded length = 43 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:112: Parsing answer 1 at offset 0x3b 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:229: Parsed domain name = _sip._tcp.icscf.sprout.staging.cw-ngv.com, encoded length = 2 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:282: Resource Record NAME=_sip._tcp.icscf.sprout.staging.cw-ngv.com TYPE=SRV CLASS=IN TTL=60 RDLENGTH=34 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:309: Parse SRV record RDATA 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:229: Parsed domain name = ip-10-0-0-132.ec2.internal, encoded length = 28 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:142: Answer records _sip._tcp.icscf.sprout.staging.cw-ngv.com 60 IN SRV 1 1 5052 ip-10-0-0-132.ec2.internal 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:143: Authority records 01-06-2016 14:00:11.209 UTC Debug dnsparser.cpp:144: Additional records · if Bono is not recording both DNS IP addresses, check that your DNS server has reloaded its configuration since you changed it, and test it with nslookup or dig · check whether the attempts to the other IP address are succeeding or failing - again, I see logs on my test servers of the following form: 01-06-2016 14:00:34.215 UTC Verbose pjsip: tcpc0x7f15b415 TCP client transport created 01-06-2016 14:00:34.215 UTC Verbose pjsip: tcpc0x7f15b415 TCP transport 10.0.0.52:37215 is connecting to 10.0.0.132:5052... 01-06-2016 14:00:34.215 UTC Debug connection_pool.cpp:248: Created transport tcpc0x7f15b4152dc8 in slot 25 (10.0.0.52:37215 to 10.0.0.132:5052) 01-06-2016 14:00:34.225 UTC Verbose pjsip: tcpc0x7f15b415 TCP transport 10.0.0.52:37215 is connected to 10.0.0.132:5052 01-06-2016 14:00:34.225 UTC Debug connection_pool.cpp:336: Transport tcpc0x7f15b4152dc8 in slot 25 has connected · (if you're not sure, share the full Bono logs). I hope that helps. Thanks, Matt From: Clearwater [mailto:[email protected]] On Behalf Of [email protected] Sent: 30 May 2016 08:32 To: [email protected] Subject: [Project Clearwater] Second Sprout issue Hi Chris, Thanks for your continuous support. The links that you have sent me didn’t answer my questions. So could you please send me other links regarding to redundancy, load balancing and scalability in Clearwater. After I turned off the first Sprout 10.0.0.4 I can’t register the users any more even the second sprout 10.0.0.3 is on. Please see the attachments. I want the second node of Sprout to take the place of First sprout as long as it’s off. Best Regards Asra _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.
_______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org
