Names resolve correctly. I made two test with 50 calls. One with single deployment (one copy for each component) and one with double deployment (two copies).
Hope they can help... I'm giving them a deep look right now. https://app.box.com/s/irvmyhphgoqwm3avvd9m https://app.box.com/s/mjxf73f1kmbc2ntxqi63 2014-11-28 20:52 GMT+01:00 Eleanor Merry <[email protected]>: > It looks like there are connection issues between the Sprouts and > Homesteads. Can you check that the Sprout can resolve the Homestead > address? The clearwater processes look in /etc/resolv.conf for DNS > configuration. If your nodes don’t use localhost (try running nslookup on a > hostname and see if the Server value is 127.0.0.1) you’ll need to update > the dnsmasq configuration to point to this server rather than localhost. > You can find details of how to do this at: > https://github.com/Metaswitch/clearwater-docs/wiki/Clearwater-DNS-Usage#client-configuration > > > > Can you also please send me the debug logs for this call from Sprout and > Homestead. You can turn on debug logging by creating/editing the file > /etc/clearwater/user_settings, adding log_level=5 and then restarting the > component (service sprout <component> - it's automatically restarted by > monit). The logs are output in /var/log/<component>/*. You won’t want to > have stress running though the node with debug logs on though – can you > either try making a standard call on your deployment, or set count= in > /etc/clearwater/config on your stress node to a low number. > > The reported Cassandra errors are benign. They are caused by the > Cassandra process not having quite started up when the Homestead process > started. This meant Homestead couldn’t initialize its cache. After a couple > of seconds, once Cassandra is up, Homestead can successfully initialize its > cache and so starts up properly. > > > > Ellie > > > > *From:* Antonio Ken Iannillo [mailto:[email protected]] > *Sent:* 28 November 2014 16:13 > *To:* [email protected] > *Cc:* Eleanor Merry > > *Subject:* Re: [Clearwater] Connection Timeout on Large Scale Deployment > > > > I also found this error in sprout log > > > 28-11-2014 16:04:07.905 UTC Error httpconnection.cpp:569: > http://hs.cini-clearwater.com:8888/impu/sip%3A2010000140%40cini-clearwater.com/reg-data > failed at server 192.168.1.113 : HTTP response code said error (22 400) : > fatal > 28-11-2014 16:04:07.906 UTC Error httpconnection.cpp:667: cURL failure > with cURL error code 22 (see man 3 libcurl-errors) and HTTP error code 400 > 28-11-2014 16:04:07.906 UTC Error hssconnection.cpp:565: Could not get > subscriber data from HSS > 28-11-2014 16:04:08.128 UTC Error httpconnection.cpp:569: > http://hs.cini-clearwater.com:8888/impu/sip%3A2010000030%40cini-clearwater.com/reg-data > failed at server 192.168.1.113 : HTTP response code said error (22 400) : > fatal > 28-11-2014 16:04:08.128 UTC Error httpconnection.cpp:667: cURL failure > with cURL error code 22 (see man 3 libcurl-errors) and HTTP error code 400 > 28-11-2014 16:04:08.128 UTC Error hssconnection.cpp:565: Could not get > subscriber data from HSS > > > > > > 2014-11-28 16:21 GMT+01:00 Antonio Ken Iannillo < > [email protected]>: > > > > ---------- Forwarded message ---------- > From: *Antonio Ken Iannillo* <[email protected]> > Date: 2014-11-28 16:19 GMT+01:00 > Subject: Re: [Clearwater] Connection Timeout on Large Scale Deployment > To: Eleanor Merry <[email protected]> > > I solved that problem... I think internally clearwater nodes talks in > some way directly with dnsmasq to resolve the name. I used a different > approach and when i configured back dnsmasq all worked again... but i get > another strange unexpected message. > > > > > '. > 2014-11-28 15:54:26:984 1417186466.984707: Aborting call on > unexpected message for Call-Id '[email protected]': while expecting > 'INVITE' (index 16), received 'SIP/2.0 480 Temporarily Unavailable > Via: SIP/2.0/TCP 192.168.1.107:39003 > ;rport=39003;received=192.168.1.107;branch=z9hG4bK-2010000000-100-1.000000-1 > Record-Route: < > sip:sprout.cini-clearwater.com:5054;transport=TCP;lr;service=scscf;billing-role=charge-term > > > Record-Route: < > sip:sprout.cini-clearwater.com:5054;transport=TCP;lr;service=scscf;billing-role=charge-orig > > > Record-Route: <sip:192.168.1.121:5058;transport=TCP;lr> > Record-Route: <sip:[email protected]:5060;transport=TCP;lr> > Call-ID: 2010000000-1.000000///[email protected] > From: <sip:[email protected]>;tag=1787SIPpTag001001234 > To: <sip:[email protected] > >;tag=z9hG4bKPjdoa02taVAMk8fNAOlz8drD7MrbjflAFY > CSeq: 5 INVITE > Content-Length: 0 > '. > > > > I already followed the guide for troubleshooting clearwater and the only > thing i found 'till now is in the homestead log during the start up. > > > > the error messages > > 28-11-2014 15:13:36.580 UTC Error cassandra_store.cpp:170: Cache caught > TTransportException: connect() failed: Connection refused > 28-11-2014 15:13:36.580 UTC Error main.cpp:472: Failed to initialize cache > - rc 3 > 28-11-2014 15:13:38.042 UTC Error cassandra_store.cpp:170: Cache caught > TTransportException: connect() failed: Connection refused > 28-11-2014 15:13:38.042 UTC Error main.cpp:472: Failed to initialize cache > - rc 3 > > > > the full log is > > 28-11-2014 15:13:36.572 UTC Status main.cpp:418: Log level set to 2 > 28-11-2014 15:13:36.579 UTC Status main.cpp:438: Access logging enabled to > /var/log/homestead > 28-11-2014 15:13:36.579 UTC Status dnscachedresolver.cpp:90: Creating > Cached Resolver using server 127.0.0.1 > 28-11-2014 15:13:36.579 UTC Status httpresolver.cpp:50: Created HTTP > resolver > 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:143: Configuring > store > 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:144: Hostname: > localhost > 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:145: Port: > 9160 > 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:146: Threads: 10 > 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:147: Max Queue: 0 > 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:162: Starting store > 28-11-2014 15:13:36.580 UTC Error cassandra_store.cpp:170: Cache caught > TTransportException: connect() failed: Connection refused > 28-11-2014 15:13:36.580 UTC Error main.cpp:472: Failed to initialize cache > - rc 3 > 28-11-2014 15:13:36.580 UTC Status cassandra_store.cpp:204: Stopping cache > 28-11-2014 15:13:36.580 UTC Status cassandra_store.cpp:214: Waiting for > cache to stop > 28-11-2014 15:13:38.037 UTC Status main.cpp:418: Log level set to 2 > 28-11-2014 15:13:38.037 UTC Status main.cpp:438: Access logging enabled to > /var/log/homestead > 28-11-2014 15:13:38.038 UTC Status dnscachedresolver.cpp:90: Creating > Cached Resolver using server 127.0.0.1 > 28-11-2014 15:13:38.038 UTC Status httpresolver.cpp:50: Created HTTP > resolver > 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:143: Configuring > store > 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:144: Hostname: > localhost > 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:145: Port: > 9160 > 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:146: Threads: 10 > 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:147: Max Queue: 0 > 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:162: Starting store > 28-11-2014 15:13:38.042 UTC Error cassandra_store.cpp:170: Cache caught > TTransportException: connect() failed: Connection refused > 28-11-2014 15:13:38.042 UTC Error main.cpp:472: Failed to initialize cache > - rc 3 > 28-11-2014 15:13:38.042 UTC Status cassandra_store.cpp:204: Stopping cache > 28-11-2014 15:13:38.042 UTC Status cassandra_store.cpp:214: Waiting for > cache to stop > 28-11-2014 15:13:48.070 UTC Status main.cpp:418: Log level set to 2 > 28-11-2014 15:13:48.070 UTC Status main.cpp:438: Access logging enabled to > /var/log/homestead > 28-11-2014 15:13:48.071 UTC Status dnscachedresolver.cpp:90: Creating > Cached Resolver using server 127.0.0.1 > 28-11-2014 15:13:48.071 UTC Status httpresolver.cpp:50: Created HTTP > resolver > 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:143: Configuring > store > 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:144: Hostname: > localhost > 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:145: Port: > 9160 > 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:146: Threads: 10 > 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:147: Max Queue: 0 > 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:162: Starting store > 28-11-2014 15:13:48.073 UTC Status diameterstack.cpp:63: Initializing > Diameter stack > 28-11-2014 15:13:48.073 UTC Status diameterstack.cpp:307: Configuring > Diameter stack from file /var/lib/homestead/homestead.conf > 28-11-2014 15:13:48.280 UTC Status freeDiameter: All extensions loaded. > 28-11-2014 15:13:48.280 UTC Status freeDiameter: freeDiameter > configuration: > 28-11-2014 15:13:48.280 UTC Status freeDiameter: Default trace level > .... : +3 > 28-11-2014 15:13:48.280 UTC Status freeDiameter: Configuration file > ..... : /var/lib/homestead/homestead.conf > 28-11-2014 15:13:48.280 UTC Status freeDiameter: Diameter Identity > ...... : 192.168.1.113 (l:13) > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Diameter Realm > ......... : hs.cini-clearwater.com (l:22) > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Tc Timer > ............... : 30 > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Tw Timer > ............... : 30 > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local port > ............. : 3868 > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local secure port > ...... : 5658 > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Number of SCTP streams > . : 3 > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Number of clients thr > .. : 5 > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Number of app threads > .. : 4 > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local endpoints > ........ : Default (use all available) > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local applications > ..... : (none) > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Flags : - IP > ........... : Enabled > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - IPv6 > ......... : Enabled > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Relay app > .... : Enabled > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - TCP > .......... : Enabled > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - SCTP > ......... : Enabled > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Pref. proto > .. : SCTP > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - TLS method > ... : Separate port > 28-11-2014 15:13:48.281 UTC Status freeDiameter: TLS : - Certificate > .. : /var/lib/homestead/cert.pem > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Private key > .. : /var/lib/homestead/privkey.pem > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - CA (trust) > ... : /var/lib/homestead/ca.pem (1 certs) > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - CRL > .......... : (none) > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Priority > ..... : (default: 'NORMAL') > 28-11-2014 15:13:48.281 UTC Status freeDiameter: - DH bits > ...... : 1024 > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Origin-State-Id > ........ : 1417187628 > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: > '/usr/share/clearwater/homestead/lib/freeDiameter//dbg_monitor.fdx'[(no > config file)], loaded > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: > '/usr/share/clearwater/homestead/lib/freeDiameter//dict_nasreq.fdx'[(no > config file)], loaded > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: > '/usr/share/clearwater/homestead/lib/freeDiameter//dict_sip.fdx'[(no config > file)], loaded > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: > '/usr/share/clearwater/homestead/lib/freeDiameter//dict_dcca.fdx'[(no > config file)], loaded > 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: > '/usr/share/clearwater/homestead/lib/freeDiameter//dict_dcca_3gpp.fdx'[(no > config file)], loaded > 28-11-2014 15:13:48.281 UTC Status freeDiameter: > {signal:12}'dbg_monitor'->0x7faf1c2ebd70 > 28-11-2014 15:13:48.281 UTC Status diameterstack.cpp:432: Starting > Diameter stack > 28-11-2014 15:13:48.288 UTC Status freeDiameter: Local server address(es): > 192.168.1.113{---L-} > 28-11-2014 15:13:48.288 UTC Status handlers.cpp:76: Configuring > HssCacheTask > 28-11-2014 15:13:48.288 UTC Status handlers.cpp:77: Dest-Realm: > cini-clearwater.com > 28-11-2014 15:13:48.288 UTC Status handlers.cpp:78: Dest-Host: > 28-11-2014 15:13:48.288 UTC Status handlers.cpp:79: Server-Name: > sip:sprout.cini-clearwater.com:5054;transport=TCP > 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:131: Configuring HTTP > stack > 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:132: Bind address: > 192.168.1.113 > 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:133: Bind port: 8888 > 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:134: Num threads: 50 > 28-11-2014 15:13:48.547 UTC Status main.cpp:597: Start-up complete - wait > for termination signal > > > > > > > > > > > > 2014-11-28 10:36 GMT+01:00 Antonio Ken Iannillo <[email protected]>: > > I checked the bono node and I found in its log: > > UTC Error connection_pool.cpp:217: Failed to resolve > sprout.cini-clearwater.com to an IP address - Not found (PJ_ENOTFOUND) > > > > But I can ping all the other machines in the deployment with their > symbolic name (even sprout.cini-clearwater.com) so I cant' understand > this error. > > > > Any idea? > > > > Antonio Ken Iannillo > > > > > > 2014-11-27 19:53 GMT+01:00 Eleanor Merry <[email protected]>: > > Hi, > > The stress tests shouldn't time out like that. Can you see if either of > your Bono's receives the REGISTERs? If they don't then can you check that > the stress node can resolve the Bono address (on a sip stress node, this > will be either the home_domain, or (one of the) bono_servers set in > /etc/clearwater/config), and that the Bono nodes will accept traffic from > the stress nodes. > > Thanks for the documentation pointer - I'll update this. > > Ellie > > > -----Original Message----- > From: [email protected] [mailto: > [email protected]] On Behalf Of Antonio Ken > Iannillo > Sent: 26 November 2014 18:43 > To: [email protected] > Subject: [Clearwater] Connection Timeout on Large Scale Deployment > > Hi all, > > I manually installed a larger scale deployment (2 copies of every service > but ellis) and I run the stress sipp test. > > I got all timeouts, but I didn't get them with a simple deployment. Do you > think it depends on a greater network delay or maybe i could have > mis-configured something? > > In sipp error logs I got something like: > > '. > > 2014-11-26 19:27:16:661 1417026436.661073: Aborting call on > > unexpected message for Call-Id '[email protected]': while > > expecting '401' (index 3), received 'SIP/2.0 408 Request Timeout > > Via: SIP/2.0/TCP 192.168.1.107:5060 > > ;rport=24582;received=192.168.1.107;branch=z9hG4bK-9941-17-2-201000016 > > 6- > > Call-ID: 2010000166///[email protected] > > From: <sip:[email protected]>;tag=9941SIPpTag0017 > > To: > > <sip:[email protected]>;tag=z9hG4bK-9941-17-2-2010000166- > > CSeq: 1 REGISTER > > Content-Length: 0 > > '. > > > while in the sipp message log i have: > > Problem EAGAIN on socket 12 and poll_idx is 2 > > Added first buffered message to socket 12 Problem EWOULDBLOCK on > > socket 12 and poll_idx is 2 Exit problem event on socket 12 Wrote > > 702 of 702 bytes in an output buffer. > > ----------------------------------------------- 2014-11-26 > > 19:27:11:27.944 TCP message received [363] bytes : > > SIP/2.0 408 Request Timeout > > Via: SIP/2.0/TCP 192.168.1.107:5060 > > ;rport=24578;received=192.168.1.107;branch=z9hG4bK-9941-23-2-201000015 > > 4- > > Call-ID: 2010000154///[email protected] > > From: <sip:[email protected]>;tag=9941SIPpTag0023 > > To: > > <sip:[email protected]>;tag=z9hG4bK-9941-23-2-2010000154- > > CSeq: 1 REGISTER > > Content-Length: 0 > > > > ----------------------------------------------- > > Unexpected TCP message received: > > SIP/2.0 408 Request Timeout > > Via: SIP/2.0/TCP 192.168.1.107:5060 > > ;rport=24578;received=192.168.1.107;branch=z9hG4bK-9941-23-2-201000015 > > 4- > > Call-ID: 2010000154///[email protected] > > From: <sip:[email protected]>;tag=9941SIPpTag0023 > > To: > > <sip:[email protected]>;tag=z9hG4bK-9941-23-2-2010000154- > > CSeq: 1 REGISTER > > Content-Length: 0 > > > I tried to increase send and receive timeout in the sipp command but the > results didn't change. > > P.S.: Feedback on documentation --> in the description of large scale > deployment, while clustering homestead one should unistall both homestead > and homestead-prov in order to inject both schemas in cassandra. In the doc > only homestead is specified. > > _______________________________________________ > Clearwater mailing list > [email protected] > http://lists.projectclearwater.org/listinfo/clearwater > > > > > > > > > _______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/listinfo/clearwater
