It looks like there are connection issues between the Sprouts and Homesteads. Can you check that the Sprout can resolve the Homestead address? The clearwater processes look in /etc/resolv.conf for DNS configuration. If your nodes don’t use localhost (try running nslookup on a hostname and see if the Server value is 127.0.0.1) you’ll need to update the dnsmasq configuration to point to this server rather than localhost. You can find details of how to do this at: https://github.com/Metaswitch/clearwater-docs/wiki/Clearwater-DNS-Usage#client-configuration
Can you also please send me the debug logs for this call from Sprout and Homestead. You can turn on debug logging by creating/editing the file /etc/clearwater/user_settings, adding log_level=5 and then restarting the component (service sprout <component> - it's automatically restarted by monit). The logs are output in /var/log/<component>/*. You won’t want to have stress running though the node with debug logs on though – can you either try making a standard call on your deployment, or set count= in /etc/clearwater/config on your stress node to a low number. The reported Cassandra errors are benign. They are caused by the Cassandra process not having quite started up when the Homestead process started. This meant Homestead couldn’t initialize its cache. After a couple of seconds, once Cassandra is up, Homestead can successfully initialize its cache and so starts up properly. Ellie From: Antonio Ken Iannillo [mailto:[email protected]] Sent: 28 November 2014 16:13 To: [email protected] Cc: Eleanor Merry Subject: Re: [Clearwater] Connection Timeout on Large Scale Deployment I also found this error in sprout log 28-11-2014 16:04:07.905 UTC Error httpconnection.cpp:569: http://hs.cini-clearwater.com:8888/impu/sip%3A2010000140%40cini-clearwater.com/reg-data failed at server 192.168.1.113 : HTTP response code said error (22 400) : fatal 28-11-2014 16:04:07.906 UTC Error httpconnection.cpp:667: cURL failure with cURL error code 22 (see man 3 libcurl-errors) and HTTP error code 400 28-11-2014 16:04:07.906 UTC Error hssconnection.cpp:565: Could not get subscriber data from HSS 28-11-2014 16:04:08.128 UTC Error httpconnection.cpp:569: http://hs.cini-clearwater.com:8888/impu/sip%3A2010000030%40cini-clearwater.com/reg-data failed at server 192.168.1.113 : HTTP response code said error (22 400) : fatal 28-11-2014 16:04:08.128 UTC Error httpconnection.cpp:667: cURL failure with cURL error code 22 (see man 3 libcurl-errors) and HTTP error code 400 28-11-2014 16:04:08.128 UTC Error hssconnection.cpp:565: Could not get subscriber data from HSS 2014-11-28 16:21 GMT+01:00 Antonio Ken Iannillo <[email protected]<mailto:[email protected]>>: ---------- Forwarded message ---------- From: Antonio Ken Iannillo <[email protected]<mailto:[email protected]>> Date: 2014-11-28 16:19 GMT+01:00 Subject: Re: [Clearwater] Connection Timeout on Large Scale Deployment To: Eleanor Merry <[email protected]<mailto:[email protected]>> I solved that problem... I think internally clearwater nodes talks in some way directly with dnsmasq to resolve the name. I used a different approach and when i configured back dnsmasq all worked again... but i get another strange unexpected message. '. 2014-11-28 15:54:26:984 1417186466.984707: Aborting call on unexpected message for Call-Id '[email protected]<mailto:[email protected]>': while expecting 'INVITE' (index 16), received 'SIP/2.0 480 Temporarily Unavailable Via: SIP/2.0/TCP 192.168.1.107:39003;rport=39003;received=192.168.1.107;branch=z9hG4bK-2010000000-100-1.000000-1 Record-Route: <sip:sprout.cini-clearwater.com:5054;transport=TCP;lr;service=scscf;billing-role=charge-term> Record-Route: <sip:sprout.cini-clearwater.com:5054;transport=TCP;lr;service=scscf;billing-role=charge-orig> Record-Route: <sip:192.168.1.121:5058;transport=TCP;lr> Record-Route: <sip:[email protected]:5060;transport=TCP;lr> Call-ID: 2010000000-1.000000///[email protected]<mailto:[email protected]> From: <sip:[email protected]<mailto:sip%[email protected]>>;tag=1787SIPpTag001001234 To: <sip:[email protected]<mailto:sip%[email protected]>>;tag=z9hG4bKPjdoa02taVAMk8fNAOlz8drD7MrbjflAFY CSeq: 5 INVITE Content-Length: 0 '. I already followed the guide for troubleshooting clearwater and the only thing i found 'till now is in the homestead log during the start up. the error messages 28-11-2014 15:13:36.580 UTC Error cassandra_store.cpp:170: Cache caught TTransportException: connect() failed: Connection refused 28-11-2014 15:13:36.580 UTC Error main.cpp:472: Failed to initialize cache - rc 3 28-11-2014 15:13:38.042 UTC Error cassandra_store.cpp:170: Cache caught TTransportException: connect() failed: Connection refused 28-11-2014 15:13:38.042 UTC Error main.cpp:472: Failed to initialize cache - rc 3 the full log is 28-11-2014 15:13:36.572 UTC Status main.cpp:418: Log level set to 2 28-11-2014 15:13:36.579 UTC Status main.cpp:438: Access logging enabled to /var/log/homestead 28-11-2014 15:13:36.579 UTC Status dnscachedresolver.cpp:90: Creating Cached Resolver using server 127.0.0.1 28-11-2014 15:13:36.579 UTC Status httpresolver.cpp:50: Created HTTP resolver 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:143: Configuring store 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:144: Hostname: localhost 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:145: Port: 9160 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:146: Threads: 10 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:147: Max Queue: 0 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:162: Starting store 28-11-2014 15:13:36.580 UTC Error cassandra_store.cpp:170: Cache caught TTransportException: connect() failed: Connection refused 28-11-2014 15:13:36.580 UTC Error main.cpp:472: Failed to initialize cache - rc 3 28-11-2014 15:13:36.580 UTC Status cassandra_store.cpp:204: Stopping cache 28-11-2014 15:13:36.580 UTC Status cassandra_store.cpp:214: Waiting for cache to stop 28-11-2014 15:13:38.037 UTC Status main.cpp:418: Log level set to 2 28-11-2014 15:13:38.037 UTC Status main.cpp:438: Access logging enabled to /var/log/homestead 28-11-2014 15:13:38.038 UTC Status dnscachedresolver.cpp:90: Creating Cached Resolver using server 127.0.0.1 28-11-2014 15:13:38.038 UTC Status httpresolver.cpp:50: Created HTTP resolver 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:143: Configuring store 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:144: Hostname: localhost 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:145: Port: 9160 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:146: Threads: 10 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:147: Max Queue: 0 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:162: Starting store 28-11-2014 15:13:38.042 UTC Error cassandra_store.cpp:170: Cache caught TTransportException: connect() failed: Connection refused 28-11-2014 15:13:38.042 UTC Error main.cpp:472: Failed to initialize cache - rc 3 28-11-2014 15:13:38.042 UTC Status cassandra_store.cpp:204: Stopping cache 28-11-2014 15:13:38.042 UTC Status cassandra_store.cpp:214: Waiting for cache to stop 28-11-2014 15:13:48.070 UTC Status main.cpp:418: Log level set to 2 28-11-2014 15:13:48.070 UTC Status main.cpp:438: Access logging enabled to /var/log/homestead 28-11-2014 15:13:48.071 UTC Status dnscachedresolver.cpp:90: Creating Cached Resolver using server 127.0.0.1 28-11-2014 15:13:48.071 UTC Status httpresolver.cpp:50: Created HTTP resolver 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:143: Configuring store 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:144: Hostname: localhost 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:145: Port: 9160 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:146: Threads: 10 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:147: Max Queue: 0 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:162: Starting store 28-11-2014 15:13:48.073 UTC Status diameterstack.cpp:63: Initializing Diameter stack 28-11-2014 15:13:48.073 UTC Status diameterstack.cpp:307: Configuring Diameter stack from file /var/lib/homestead/homestead.conf 28-11-2014 15:13:48.280 UTC Status freeDiameter: All extensions loaded. 28-11-2014 15:13:48.280 UTC Status freeDiameter: freeDiameter configuration: 28-11-2014 15:13:48.280 UTC Status freeDiameter: Default trace level .... : +3 28-11-2014 15:13:48.280 UTC Status freeDiameter: Configuration file ..... : /var/lib/homestead/homestead.conf 28-11-2014 15:13:48.280 UTC Status freeDiameter: Diameter Identity ...... : 192.168.1.113 (l:13) 28-11-2014 15:13:48.281 UTC Status freeDiameter: Diameter Realm ......... : hs.cini-clearwater.com<http://hs.cini-clearwater.com> (l:22) 28-11-2014 15:13:48.281 UTC Status freeDiameter: Tc Timer ............... : 30 28-11-2014 15:13:48.281 UTC Status freeDiameter: Tw Timer ............... : 30 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local port ............. : 3868 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local secure port ...... : 5658 28-11-2014 15:13:48.281 UTC Status freeDiameter: Number of SCTP streams . : 3 28-11-2014 15:13:48.281 UTC Status freeDiameter: Number of clients thr .. : 5 28-11-2014 15:13:48.281 UTC Status freeDiameter: Number of app threads .. : 4 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local endpoints ........ : Default (use all available) 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local applications ..... : (none) 28-11-2014 15:13:48.281 UTC Status freeDiameter: Flags : - IP ........... : Enabled 28-11-2014 15:13:48.281 UTC Status freeDiameter: - IPv6 ......... : Enabled 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Relay app .... : Enabled 28-11-2014 15:13:48.281 UTC Status freeDiameter: - TCP .......... : Enabled 28-11-2014 15:13:48.281 UTC Status freeDiameter: - SCTP ......... : Enabled 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Pref. proto .. : SCTP 28-11-2014 15:13:48.281 UTC Status freeDiameter: - TLS method ... : Separate port 28-11-2014 15:13:48.281 UTC Status freeDiameter: TLS : - Certificate .. : /var/lib/homestead/cert.pem 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Private key .. : /var/lib/homestead/privkey.pem 28-11-2014 15:13:48.281 UTC Status freeDiameter: - CA (trust) ... : /var/lib/homestead/ca.pem (1 certs) 28-11-2014 15:13:48.281 UTC Status freeDiameter: - CRL .......... : (none) 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Priority ..... : (default: 'NORMAL') 28-11-2014 15:13:48.281 UTC Status freeDiameter: - DH bits ...... : 1024 28-11-2014 15:13:48.281 UTC Status freeDiameter: Origin-State-Id ........ : 1417187628 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: '/usr/share/clearwater/homestead/lib/freeDiameter//dbg_monitor.fdx'[(no config file)], loaded 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: '/usr/share/clearwater/homestead/lib/freeDiameter//dict_nasreq.fdx'[(no config file)], loaded 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: '/usr/share/clearwater/homestead/lib/freeDiameter//dict_sip.fdx'[(no config file)], loaded 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: '/usr/share/clearwater/homestead/lib/freeDiameter//dict_dcca.fdx'[(no config file)], loaded 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: '/usr/share/clearwater/homestead/lib/freeDiameter//dict_dcca_3gpp.fdx'[(no config file)], loaded 28-11-2014 15:13:48.281 UTC Status freeDiameter: {signal:12}'dbg_monitor'->0x7faf1c2ebd70 28-11-2014 15:13:48.281 UTC Status diameterstack.cpp:432: Starting Diameter stack 28-11-2014 15:13:48.288 UTC Status freeDiameter: Local server address(es): 192.168.1.113{---L-} 28-11-2014 15:13:48.288 UTC Status handlers.cpp:76: Configuring HssCacheTask 28-11-2014 15:13:48.288 UTC Status handlers.cpp:77: Dest-Realm: cini-clearwater.com<http://cini-clearwater.com> 28-11-2014 15:13:48.288 UTC Status handlers.cpp:78: Dest-Host: 28-11-2014 15:13:48.288 UTC Status handlers.cpp:79: Server-Name: sip:sprout.cini-clearwater.com:5054;transport=TCP 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:131: Configuring HTTP stack 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:132: Bind address: 192.168.1.113 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:133: Bind port: 8888 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:134: Num threads: 50 28-11-2014 15:13:48.547 UTC Status main.cpp:597: Start-up complete - wait for termination signal 2014-11-28 10:36 GMT+01:00 Antonio Ken Iannillo <[email protected]<mailto:[email protected]>>: I checked the bono node and I found in its log: UTC Error connection_pool.cpp:217: Failed to resolve sprout.cini-clearwater.com<http://sprout.cini-clearwater.com> to an IP address - Not found (PJ_ENOTFOUND) But I can ping all the other machines in the deployment with their symbolic name (even sprout.cini-clearwater.com<http://sprout.cini-clearwater.com>) so I cant' understand this error. Any idea? Antonio Ken Iannillo 2014-11-27 19:53 GMT+01:00 Eleanor Merry <[email protected]<mailto:[email protected]>>: Hi, The stress tests shouldn't time out like that. Can you see if either of your Bono's receives the REGISTERs? If they don't then can you check that the stress node can resolve the Bono address (on a sip stress node, this will be either the home_domain, or (one of the) bono_servers set in /etc/clearwater/config), and that the Bono nodes will accept traffic from the stress nodes. Thanks for the documentation pointer - I'll update this. Ellie -----Original Message----- From: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] On Behalf Of Antonio Ken Iannillo Sent: 26 November 2014 18:43 To: [email protected]<mailto:[email protected]> Subject: [Clearwater] Connection Timeout on Large Scale Deployment Hi all, I manually installed a larger scale deployment (2 copies of every service but ellis) and I run the stress sipp test. I got all timeouts, but I didn't get them with a simple deployment. Do you think it depends on a greater network delay or maybe i could have mis-configured something? In sipp error logs I got something like: '. > 2014-11-26 19:27:16:661 1417026436.661073: Aborting call on > unexpected message for Call-Id > '[email protected]<mailto:[email protected]>': while > expecting '401' (index 3), received 'SIP/2.0 408 Request Timeout > Via: SIP/2.0/TCP 192.168.1.107:5060<http://192.168.1.107:5060> > ;rport=24582;received=192.168.1.107;branch=z9hG4bK-9941-17-2-201000016 > 6- > Call-ID: 2010000166///[email protected]<mailto:[email protected]> > From: > <sip:[email protected]<mailto:sip%[email protected]>>;tag=9941SIPpTag0017 > To: > <sip:[email protected]<mailto:sip%[email protected]>>;tag=z9hG4bK-9941-17-2-2010000166- > CSeq: 1 REGISTER > Content-Length: 0 > '. while in the sipp message log i have: Problem EAGAIN on socket 12 and poll_idx is 2 > Added first buffered message to socket 12 Problem EWOULDBLOCK on > socket 12 and poll_idx is 2 Exit problem event on socket 12 Wrote > 702 of 702 bytes in an output buffer. > ----------------------------------------------- 2014-11-26 > 19:27:11:27.944 TCP message received [363] bytes : > SIP/2.0 408 Request Timeout > Via: SIP/2.0/TCP 192.168.1.107:5060<http://192.168.1.107:5060> > ;rport=24578;received=192.168.1.107;branch=z9hG4bK-9941-23-2-201000015 > 4- > Call-ID: 2010000154///[email protected]<mailto:[email protected]> > From: > <sip:[email protected]<mailto:sip%[email protected]>>;tag=9941SIPpTag0023 > To: > <sip:[email protected]<mailto:sip%[email protected]>>;tag=z9hG4bK-9941-23-2-2010000154- > CSeq: 1 REGISTER > Content-Length: 0 > > ----------------------------------------------- > Unexpected TCP message received: > SIP/2.0 408 Request Timeout > Via: SIP/2.0/TCP 192.168.1.107:5060<http://192.168.1.107:5060> > ;rport=24578;received=192.168.1.107;branch=z9hG4bK-9941-23-2-201000015 > 4- > Call-ID: 2010000154///[email protected]<mailto:[email protected]> > From: > <sip:[email protected]<mailto:sip%[email protected]>>;tag=9941SIPpTag0023 > To: > <sip:[email protected]<mailto:sip%[email protected]>>;tag=z9hG4bK-9941-23-2-2010000154- > CSeq: 1 REGISTER > Content-Length: 0 I tried to increase send and receive timeout in the sipp command but the results didn't change. P.S.: Feedback on documentation --> in the description of large scale deployment, while clustering homestead one should unistall both homestead and homestead-prov in order to inject both schemas in cassandra. In the doc only homestead is specified. _______________________________________________ Clearwater mailing list [email protected]<mailto:[email protected]> http://lists.projectclearwater.org/listinfo/clearwater _______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/listinfo/clearwater
