I forgot to say that problems show up only with the double deployment, that's why i think it could be a clustering problem.
2014-12-01 10:03 GMT+01:00 Antonio Ken Iannillo < [email protected]>: > Names resolve correctly. > > I made two test with 50 calls. One with single deployment (one copy for > each component) and one with double deployment (two copies). > > Hope they can help... I'm giving them a deep look right now. > > https://app.box.com/s/irvmyhphgoqwm3avvd9m > > https://app.box.com/s/mjxf73f1kmbc2ntxqi63 > > 2014-11-28 20:52 GMT+01:00 Eleanor Merry <[email protected]>: > >> It looks like there are connection issues between the Sprouts and >> Homesteads. Can you check that the Sprout can resolve the Homestead >> address? The clearwater processes look in /etc/resolv.conf for DNS >> configuration. If your nodes don’t use localhost (try running nslookup on a >> hostname and see if the Server value is 127.0.0.1) you’ll need to update >> the dnsmasq configuration to point to this server rather than localhost. >> You can find details of how to do this at: >> https://github.com/Metaswitch/clearwater-docs/wiki/Clearwater-DNS-Usage#client-configuration >> >> >> >> Can you also please send me the debug logs for this call from Sprout and >> Homestead. You can turn on debug logging by creating/editing the file >> /etc/clearwater/user_settings, adding log_level=5 and then restarting the >> component (service sprout <component> - it's automatically restarted by >> monit). The logs are output in /var/log/<component>/*. You won’t want to >> have stress running though the node with debug logs on though – can you >> either try making a standard call on your deployment, or set count= in >> /etc/clearwater/config on your stress node to a low number. >> >> The reported Cassandra errors are benign. They are caused by the >> Cassandra process not having quite started up when the Homestead process >> started. This meant Homestead couldn’t initialize its cache. After a couple >> of seconds, once Cassandra is up, Homestead can successfully initialize its >> cache and so starts up properly. >> >> >> >> Ellie >> >> >> >> *From:* Antonio Ken Iannillo [mailto:[email protected]] >> *Sent:* 28 November 2014 16:13 >> *To:* [email protected] >> *Cc:* Eleanor Merry >> >> *Subject:* Re: [Clearwater] Connection Timeout on Large Scale Deployment >> >> >> >> I also found this error in sprout log >> >> >> 28-11-2014 16:04:07.905 UTC Error httpconnection.cpp:569: >> http://hs.cini-clearwater.com:8888/impu/sip%3A2010000140%40cini-clearwater.com/reg-data >> failed at server 192.168.1.113 : HTTP response code said error (22 400) : >> fatal >> 28-11-2014 16:04:07.906 UTC Error httpconnection.cpp:667: cURL failure >> with cURL error code 22 (see man 3 libcurl-errors) and HTTP error code 400 >> 28-11-2014 16:04:07.906 UTC Error hssconnection.cpp:565: Could not get >> subscriber data from HSS >> 28-11-2014 16:04:08.128 UTC Error httpconnection.cpp:569: >> http://hs.cini-clearwater.com:8888/impu/sip%3A2010000030%40cini-clearwater.com/reg-data >> failed at server 192.168.1.113 : HTTP response code said error (22 400) : >> fatal >> 28-11-2014 16:04:08.128 UTC Error httpconnection.cpp:667: cURL failure >> with cURL error code 22 (see man 3 libcurl-errors) and HTTP error code 400 >> 28-11-2014 16:04:08.128 UTC Error hssconnection.cpp:565: Could not get >> subscriber data from HSS >> >> >> >> >> >> 2014-11-28 16:21 GMT+01:00 Antonio Ken Iannillo < >> [email protected]>: >> >> >> >> ---------- Forwarded message ---------- >> From: *Antonio Ken Iannillo* <[email protected]> >> Date: 2014-11-28 16:19 GMT+01:00 >> Subject: Re: [Clearwater] Connection Timeout on Large Scale Deployment >> To: Eleanor Merry <[email protected]> >> >> I solved that problem... I think internally clearwater nodes talks in >> some way directly with dnsmasq to resolve the name. I used a different >> approach and when i configured back dnsmasq all worked again... but i get >> another strange unexpected message. >> >> >> >> >> '. >> 2014-11-28 15:54:26:984 1417186466.984707: Aborting call on >> unexpected message for Call-Id '[email protected]': while expecting >> 'INVITE' (index 16), received 'SIP/2.0 480 Temporarily Unavailable >> Via: SIP/2.0/TCP 192.168.1.107:39003 >> ;rport=39003;received=192.168.1.107;branch=z9hG4bK-2010000000-100-1.000000-1 >> Record-Route: < >> sip:sprout.cini-clearwater.com:5054;transport=TCP;lr;service=scscf;billing-role=charge-term >> > >> Record-Route: < >> sip:sprout.cini-clearwater.com:5054;transport=TCP;lr;service=scscf;billing-role=charge-orig >> > >> Record-Route: <sip:192.168.1.121:5058;transport=TCP;lr> >> Record-Route: <sip:[email protected]:5060;transport=TCP;lr> >> Call-ID: 2010000000-1.000000///[email protected] >> From: <sip:[email protected]>;tag=1787SIPpTag001001234 >> To: <sip:[email protected] >> >;tag=z9hG4bKPjdoa02taVAMk8fNAOlz8drD7MrbjflAFY >> CSeq: 5 INVITE >> Content-Length: 0 >> '. >> >> >> >> I already followed the guide for troubleshooting clearwater and the only >> thing i found 'till now is in the homestead log during the start up. >> >> >> >> the error messages >> >> 28-11-2014 15:13:36.580 UTC Error cassandra_store.cpp:170: Cache caught >> TTransportException: connect() failed: Connection refused >> 28-11-2014 15:13:36.580 UTC Error main.cpp:472: Failed to initialize >> cache - rc 3 >> 28-11-2014 15:13:38.042 UTC Error cassandra_store.cpp:170: Cache caught >> TTransportException: connect() failed: Connection refused >> 28-11-2014 15:13:38.042 UTC Error main.cpp:472: Failed to initialize >> cache - rc 3 >> >> >> >> the full log is >> >> 28-11-2014 15:13:36.572 UTC Status main.cpp:418: Log level set to 2 >> 28-11-2014 15:13:36.579 UTC Status main.cpp:438: Access logging enabled >> to /var/log/homestead >> 28-11-2014 15:13:36.579 UTC Status dnscachedresolver.cpp:90: Creating >> Cached Resolver using server 127.0.0.1 >> 28-11-2014 15:13:36.579 UTC Status httpresolver.cpp:50: Created HTTP >> resolver >> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:143: Configuring >> store >> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:144: Hostname: >> localhost >> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:145: Port: >> 9160 >> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:146: Threads: >> 10 >> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:147: Max Queue: 0 >> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:162: Starting store >> 28-11-2014 15:13:36.580 UTC Error cassandra_store.cpp:170: Cache caught >> TTransportException: connect() failed: Connection refused >> 28-11-2014 15:13:36.580 UTC Error main.cpp:472: Failed to initialize >> cache - rc 3 >> 28-11-2014 15:13:36.580 UTC Status cassandra_store.cpp:204: Stopping cache >> 28-11-2014 15:13:36.580 UTC Status cassandra_store.cpp:214: Waiting for >> cache to stop >> 28-11-2014 15:13:38.037 UTC Status main.cpp:418: Log level set to 2 >> 28-11-2014 15:13:38.037 UTC Status main.cpp:438: Access logging enabled >> to /var/log/homestead >> 28-11-2014 15:13:38.038 UTC Status dnscachedresolver.cpp:90: Creating >> Cached Resolver using server 127.0.0.1 >> 28-11-2014 15:13:38.038 UTC Status httpresolver.cpp:50: Created HTTP >> resolver >> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:143: Configuring >> store >> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:144: Hostname: >> localhost >> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:145: Port: >> 9160 >> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:146: Threads: >> 10 >> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:147: Max Queue: 0 >> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:162: Starting store >> 28-11-2014 15:13:38.042 UTC Error cassandra_store.cpp:170: Cache caught >> TTransportException: connect() failed: Connection refused >> 28-11-2014 15:13:38.042 UTC Error main.cpp:472: Failed to initialize >> cache - rc 3 >> 28-11-2014 15:13:38.042 UTC Status cassandra_store.cpp:204: Stopping cache >> 28-11-2014 15:13:38.042 UTC Status cassandra_store.cpp:214: Waiting for >> cache to stop >> 28-11-2014 15:13:48.070 UTC Status main.cpp:418: Log level set to 2 >> 28-11-2014 15:13:48.070 UTC Status main.cpp:438: Access logging enabled >> to /var/log/homestead >> 28-11-2014 15:13:48.071 UTC Status dnscachedresolver.cpp:90: Creating >> Cached Resolver using server 127.0.0.1 >> 28-11-2014 15:13:48.071 UTC Status httpresolver.cpp:50: Created HTTP >> resolver >> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:143: Configuring >> store >> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:144: Hostname: >> localhost >> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:145: Port: >> 9160 >> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:146: Threads: >> 10 >> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:147: Max Queue: 0 >> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:162: Starting store >> 28-11-2014 15:13:48.073 UTC Status diameterstack.cpp:63: Initializing >> Diameter stack >> 28-11-2014 15:13:48.073 UTC Status diameterstack.cpp:307: Configuring >> Diameter stack from file /var/lib/homestead/homestead.conf >> 28-11-2014 15:13:48.280 UTC Status freeDiameter: All extensions loaded. >> 28-11-2014 15:13:48.280 UTC Status freeDiameter: freeDiameter >> configuration: >> 28-11-2014 15:13:48.280 UTC Status freeDiameter: Default trace level >> .... : +3 >> 28-11-2014 15:13:48.280 UTC Status freeDiameter: Configuration file >> ..... : /var/lib/homestead/homestead.conf >> 28-11-2014 15:13:48.280 UTC Status freeDiameter: Diameter Identity >> ...... : 192.168.1.113 (l:13) >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Diameter Realm >> ......... : hs.cini-clearwater.com (l:22) >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Tc Timer >> ............... : 30 >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Tw Timer >> ............... : 30 >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local port >> ............. : 3868 >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local secure port >> ...... : 5658 >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Number of SCTP streams >> . : 3 >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Number of clients thr >> .. : 5 >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Number of app threads >> .. : 4 >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local endpoints >> ........ : Default (use all available) >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Local applications >> ..... : (none) >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Flags : - IP >> ........... : Enabled >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - IPv6 >> ......... : Enabled >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Relay app >> .... : Enabled >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - TCP >> .......... : Enabled >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - SCTP >> ......... : Enabled >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Pref. proto >> .. : SCTP >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - TLS method >> ... : Separate port >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: TLS : - Certificate >> .. : /var/lib/homestead/cert.pem >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Private key >> .. : /var/lib/homestead/privkey.pem >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - CA (trust) >> ... : /var/lib/homestead/ca.pem (1 certs) >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - CRL >> .......... : (none) >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - Priority >> ..... : (default: 'NORMAL') >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: - DH bits >> ...... : 1024 >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Origin-State-Id >> ........ : 1417187628 >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: >> '/usr/share/clearwater/homestead/lib/freeDiameter//dbg_monitor.fdx'[(no >> config file)], loaded >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: >> '/usr/share/clearwater/homestead/lib/freeDiameter//dict_nasreq.fdx'[(no >> config file)], loaded >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: >> '/usr/share/clearwater/homestead/lib/freeDiameter//dict_sip.fdx'[(no config >> file)], loaded >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: >> '/usr/share/clearwater/homestead/lib/freeDiameter//dict_dcca.fdx'[(no >> config file)], loaded >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions: >> '/usr/share/clearwater/homestead/lib/freeDiameter//dict_dcca_3gpp.fdx'[(no >> config file)], loaded >> 28-11-2014 15:13:48.281 UTC Status freeDiameter: >> {signal:12}'dbg_monitor'->0x7faf1c2ebd70 >> 28-11-2014 15:13:48.281 UTC Status diameterstack.cpp:432: Starting >> Diameter stack >> 28-11-2014 15:13:48.288 UTC Status freeDiameter: Local server >> address(es): 192.168.1.113{---L-} >> 28-11-2014 15:13:48.288 UTC Status handlers.cpp:76: Configuring >> HssCacheTask >> 28-11-2014 15:13:48.288 UTC Status handlers.cpp:77: Dest-Realm: >> cini-clearwater.com >> 28-11-2014 15:13:48.288 UTC Status handlers.cpp:78: Dest-Host: >> 28-11-2014 15:13:48.288 UTC Status handlers.cpp:79: Server-Name: >> sip:sprout.cini-clearwater.com:5054;transport=TCP >> 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:131: Configuring HTTP >> stack >> 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:132: Bind address: >> 192.168.1.113 >> 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:133: Bind port: 8888 >> 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:134: Num threads: 50 >> 28-11-2014 15:13:48.547 UTC Status main.cpp:597: Start-up complete - wait >> for termination signal >> >> >> >> >> >> >> >> >> >> >> >> 2014-11-28 10:36 GMT+01:00 Antonio Ken Iannillo <[email protected]>: >> >> I checked the bono node and I found in its log: >> >> UTC Error connection_pool.cpp:217: Failed to resolve >> sprout.cini-clearwater.com to an IP address - Not found (PJ_ENOTFOUND) >> >> >> >> But I can ping all the other machines in the deployment with their >> symbolic name (even sprout.cini-clearwater.com) so I cant' understand >> this error. >> >> >> >> Any idea? >> >> >> >> Antonio Ken Iannillo >> >> >> >> >> >> 2014-11-27 19:53 GMT+01:00 Eleanor Merry <[email protected]>: >> >> Hi, >> >> The stress tests shouldn't time out like that. Can you see if either of >> your Bono's receives the REGISTERs? If they don't then can you check that >> the stress node can resolve the Bono address (on a sip stress node, this >> will be either the home_domain, or (one of the) bono_servers set in >> /etc/clearwater/config), and that the Bono nodes will accept traffic from >> the stress nodes. >> >> Thanks for the documentation pointer - I'll update this. >> >> Ellie >> >> >> -----Original Message----- >> From: [email protected] [mailto: >> [email protected]] On Behalf Of Antonio Ken >> Iannillo >> Sent: 26 November 2014 18:43 >> To: [email protected] >> Subject: [Clearwater] Connection Timeout on Large Scale Deployment >> >> Hi all, >> >> I manually installed a larger scale deployment (2 copies of every service >> but ellis) and I run the stress sipp test. >> >> I got all timeouts, but I didn't get them with a simple deployment. Do >> you think it depends on a greater network delay or maybe i could have >> mis-configured something? >> >> In sipp error logs I got something like: >> >> '. >> > 2014-11-26 19:27:16:661 1417026436.661073: Aborting call on >> > unexpected message for Call-Id '[email protected]': while >> > expecting '401' (index 3), received 'SIP/2.0 408 Request Timeout >> > Via: SIP/2.0/TCP 192.168.1.107:5060 >> > ;rport=24582;received=192.168.1.107;branch=z9hG4bK-9941-17-2-201000016 >> > 6- >> > Call-ID: 2010000166///[email protected] >> > From: <sip:[email protected]>;tag=9941SIPpTag0017 >> > To: >> > <sip:[email protected]>;tag=z9hG4bK-9941-17-2-2010000166- >> > CSeq: 1 REGISTER >> > Content-Length: 0 >> > '. >> >> >> while in the sipp message log i have: >> >> Problem EAGAIN on socket 12 and poll_idx is 2 >> > Added first buffered message to socket 12 Problem EWOULDBLOCK on >> > socket 12 and poll_idx is 2 Exit problem event on socket 12 Wrote >> > 702 of 702 bytes in an output buffer. >> > ----------------------------------------------- 2014-11-26 >> > 19:27:11:27.944 TCP message received [363] bytes : >> > SIP/2.0 408 Request Timeout >> > Via: SIP/2.0/TCP 192.168.1.107:5060 >> > ;rport=24578;received=192.168.1.107;branch=z9hG4bK-9941-23-2-201000015 >> > 4- >> > Call-ID: 2010000154///[email protected] >> > From: <sip:[email protected]>;tag=9941SIPpTag0023 >> > To: >> > <sip:[email protected]>;tag=z9hG4bK-9941-23-2-2010000154- >> > CSeq: 1 REGISTER >> > Content-Length: 0 >> > >> > ----------------------------------------------- >> > Unexpected TCP message received: >> > SIP/2.0 408 Request Timeout >> > Via: SIP/2.0/TCP 192.168.1.107:5060 >> > ;rport=24578;received=192.168.1.107;branch=z9hG4bK-9941-23-2-201000015 >> > 4- >> > Call-ID: 2010000154///[email protected] >> > From: <sip:[email protected]>;tag=9941SIPpTag0023 >> > To: >> > <sip:[email protected]>;tag=z9hG4bK-9941-23-2-2010000154- >> > CSeq: 1 REGISTER >> > Content-Length: 0 >> >> >> I tried to increase send and receive timeout in the sipp command but the >> results didn't change. >> >> P.S.: Feedback on documentation --> in the description of large scale >> deployment, while clustering homestead one should unistall both homestead >> and homestead-prov in order to inject both schemas in cassandra. In the doc >> only homestead is specified. >> >> _______________________________________________ >> Clearwater mailing list >> [email protected] >> http://lists.projectclearwater.org/listinfo/clearwater >> >> >> >> >> >> >> >> >> > > _______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/listinfo/clearwater
