I forgot to say that problems show up only with the double deployment,
that's why i think it could be a clustering problem.

2014-12-01 10:03 GMT+01:00 Antonio Ken Iannillo <
[email protected]>:

> Names resolve correctly.
>
> I made two test with 50 calls. One with single deployment (one copy for
> each component) and one with double deployment (two copies).
>
> Hope they can help... I'm giving them a deep look right now.
>
> https://app.box.com/s/irvmyhphgoqwm3avvd9m
>
> https://app.box.com/s/mjxf73f1kmbc2ntxqi63
>
> 2014-11-28 20:52 GMT+01:00 Eleanor Merry <[email protected]>:
>
>>  It looks like there are connection issues between the Sprouts and
>> Homesteads. Can you check that the Sprout can resolve the Homestead
>> address? The clearwater processes look in /etc/resolv.conf for DNS
>> configuration. If your nodes don’t use localhost (try running nslookup on a
>> hostname and see if the Server value is 127.0.0.1) you’ll need to update
>> the dnsmasq configuration to point to this server rather than localhost.
>> You can find details of how to do this at:
>> https://github.com/Metaswitch/clearwater-docs/wiki/Clearwater-DNS-Usage#client-configuration
>>
>>
>>
>> Can you also please send me the debug logs for this call from Sprout and
>> Homestead. You can turn on debug logging by creating/editing the file
>> /etc/clearwater/user_settings, adding log_level=5 and then restarting the
>> component (service sprout <component> - it's automatically restarted by
>> monit). The logs are output in /var/log/<component>/*. You won’t want to
>> have stress running though the node with debug logs on though – can you
>> either try making a standard call on your deployment, or set count= in
>> /etc/clearwater/config on your stress node to a low number.
>>
>>  The reported Cassandra errors are benign. They are caused by the
>> Cassandra process not having quite started up when the Homestead process
>> started. This meant Homestead couldn’t initialize its cache. After a couple
>> of seconds, once Cassandra is up, Homestead can successfully initialize its
>> cache and so starts up properly.
>>
>>
>>
>> Ellie
>>
>>
>>
>> *From:* Antonio Ken Iannillo [mailto:[email protected]]
>> *Sent:* 28 November 2014 16:13
>> *To:* [email protected]
>> *Cc:* Eleanor Merry
>>
>> *Subject:* Re: [Clearwater] Connection Timeout on Large Scale Deployment
>>
>>
>>
>> I also found this error in sprout log
>>
>>
>> 28-11-2014 16:04:07.905 UTC Error httpconnection.cpp:569:
>> http://hs.cini-clearwater.com:8888/impu/sip%3A2010000140%40cini-clearwater.com/reg-data
>> failed at server 192.168.1.113 : HTTP response code said error (22 400) :
>> fatal
>> 28-11-2014 16:04:07.906 UTC Error httpconnection.cpp:667: cURL failure
>> with cURL error code 22 (see man 3 libcurl-errors) and HTTP error code 400
>> 28-11-2014 16:04:07.906 UTC Error hssconnection.cpp:565: Could not get
>> subscriber data from HSS
>> 28-11-2014 16:04:08.128 UTC Error httpconnection.cpp:569:
>> http://hs.cini-clearwater.com:8888/impu/sip%3A2010000030%40cini-clearwater.com/reg-data
>> failed at server 192.168.1.113 : HTTP response code said error (22 400) :
>> fatal
>> 28-11-2014 16:04:08.128 UTC Error httpconnection.cpp:667: cURL failure
>> with cURL error code 22 (see man 3 libcurl-errors) and HTTP error code 400
>> 28-11-2014 16:04:08.128 UTC Error hssconnection.cpp:565: Could not get
>> subscriber data from HSS
>>
>>
>>
>>
>>
>> 2014-11-28 16:21 GMT+01:00 Antonio Ken Iannillo <
>> [email protected]>:
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: *Antonio Ken Iannillo* <[email protected]>
>> Date: 2014-11-28 16:19 GMT+01:00
>> Subject: Re: [Clearwater] Connection Timeout on Large Scale Deployment
>> To: Eleanor Merry <[email protected]>
>>
>>  I solved that problem... I think internally clearwater nodes talks in
>> some way directly with dnsmasq to resolve the name. I used a different
>> approach and when i configured back dnsmasq all worked again... but i get
>> another strange unexpected message.
>>
>>
>>
>>
>> '.
>> 2014-11-28      15:54:26:984    1417186466.984707: Aborting call on
>> unexpected message for Call-Id '[email protected]': while expecting
>> 'INVITE' (index 16), received 'SIP/2.0 480 Temporarily Unavailable
>> Via: SIP/2.0/TCP 192.168.1.107:39003
>> ;rport=39003;received=192.168.1.107;branch=z9hG4bK-2010000000-100-1.000000-1
>> Record-Route: <
>> sip:sprout.cini-clearwater.com:5054;transport=TCP;lr;service=scscf;billing-role=charge-term
>> >
>> Record-Route: <
>> sip:sprout.cini-clearwater.com:5054;transport=TCP;lr;service=scscf;billing-role=charge-orig
>> >
>> Record-Route: <sip:192.168.1.121:5058;transport=TCP;lr>
>> Record-Route: <sip:[email protected]:5060;transport=TCP;lr>
>> Call-ID: 2010000000-1.000000///[email protected]
>> From: <sip:[email protected]>;tag=1787SIPpTag001001234
>> To: <sip:[email protected]
>> >;tag=z9hG4bKPjdoa02taVAMk8fNAOlz8drD7MrbjflAFY
>> CSeq: 5 INVITE
>> Content-Length:  0
>> '.
>>
>>
>>
>> I already followed the guide for troubleshooting clearwater and the only
>> thing i found 'till now is in the homestead log during the start up.
>>
>>
>>
>> the error messages
>>
>> 28-11-2014 15:13:36.580 UTC Error cassandra_store.cpp:170: Cache caught
>> TTransportException: connect() failed: Connection refused
>> 28-11-2014 15:13:36.580 UTC Error main.cpp:472: Failed to initialize
>> cache - rc 3
>> 28-11-2014 15:13:38.042 UTC Error cassandra_store.cpp:170: Cache caught
>> TTransportException: connect() failed: Connection refused
>> 28-11-2014 15:13:38.042 UTC Error main.cpp:472: Failed to initialize
>> cache - rc 3
>>
>>
>>
>> the full log is
>>
>> 28-11-2014 15:13:36.572 UTC Status main.cpp:418: Log level set to 2
>> 28-11-2014 15:13:36.579 UTC Status main.cpp:438: Access logging enabled
>> to /var/log/homestead
>> 28-11-2014 15:13:36.579 UTC Status dnscachedresolver.cpp:90: Creating
>> Cached Resolver using server 127.0.0.1
>> 28-11-2014 15:13:36.579 UTC Status httpresolver.cpp:50: Created HTTP
>> resolver
>> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:143: Configuring
>> store
>> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:144:   Hostname:
>>  localhost
>> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:145:   Port:
>>  9160
>> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:146:   Threads:
>> 10
>> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:147:   Max Queue: 0
>> 28-11-2014 15:13:36.579 UTC Status cassandra_store.cpp:162: Starting store
>> 28-11-2014 15:13:36.580 UTC Error cassandra_store.cpp:170: Cache caught
>> TTransportException: connect() failed: Connection refused
>> 28-11-2014 15:13:36.580 UTC Error main.cpp:472: Failed to initialize
>> cache - rc 3
>> 28-11-2014 15:13:36.580 UTC Status cassandra_store.cpp:204: Stopping cache
>> 28-11-2014 15:13:36.580 UTC Status cassandra_store.cpp:214: Waiting for
>> cache to stop
>> 28-11-2014 15:13:38.037 UTC Status main.cpp:418: Log level set to 2
>> 28-11-2014 15:13:38.037 UTC Status main.cpp:438: Access logging enabled
>> to /var/log/homestead
>> 28-11-2014 15:13:38.038 UTC Status dnscachedresolver.cpp:90: Creating
>> Cached Resolver using server 127.0.0.1
>> 28-11-2014 15:13:38.038 UTC Status httpresolver.cpp:50: Created HTTP
>> resolver
>> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:143: Configuring
>> store
>> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:144:   Hostname:
>>  localhost
>> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:145:   Port:
>>  9160
>> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:146:   Threads:
>> 10
>> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:147:   Max Queue: 0
>> 28-11-2014 15:13:38.038 UTC Status cassandra_store.cpp:162: Starting store
>> 28-11-2014 15:13:38.042 UTC Error cassandra_store.cpp:170: Cache caught
>> TTransportException: connect() failed: Connection refused
>> 28-11-2014 15:13:38.042 UTC Error main.cpp:472: Failed to initialize
>> cache - rc 3
>> 28-11-2014 15:13:38.042 UTC Status cassandra_store.cpp:204: Stopping cache
>> 28-11-2014 15:13:38.042 UTC Status cassandra_store.cpp:214: Waiting for
>> cache to stop
>> 28-11-2014 15:13:48.070 UTC Status main.cpp:418: Log level set to 2
>> 28-11-2014 15:13:48.070 UTC Status main.cpp:438: Access logging enabled
>> to /var/log/homestead
>> 28-11-2014 15:13:48.071 UTC Status dnscachedresolver.cpp:90: Creating
>> Cached Resolver using server 127.0.0.1
>> 28-11-2014 15:13:48.071 UTC Status httpresolver.cpp:50: Created HTTP
>> resolver
>> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:143: Configuring
>> store
>> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:144:   Hostname:
>>  localhost
>> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:145:   Port:
>>  9160
>> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:146:   Threads:
>> 10
>> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:147:   Max Queue: 0
>> 28-11-2014 15:13:48.071 UTC Status cassandra_store.cpp:162: Starting store
>> 28-11-2014 15:13:48.073 UTC Status diameterstack.cpp:63: Initializing
>> Diameter stack
>> 28-11-2014 15:13:48.073 UTC Status diameterstack.cpp:307: Configuring
>> Diameter stack from file /var/lib/homestead/homestead.conf
>> 28-11-2014 15:13:48.280 UTC Status freeDiameter: All extensions loaded.
>> 28-11-2014 15:13:48.280 UTC Status freeDiameter: freeDiameter
>> configuration:
>> 28-11-2014 15:13:48.280 UTC Status freeDiameter:   Default trace level
>> .... : +3
>> 28-11-2014 15:13:48.280 UTC Status freeDiameter:   Configuration file
>> ..... : /var/lib/homestead/homestead.conf
>> 28-11-2014 15:13:48.280 UTC Status freeDiameter:   Diameter Identity
>> ...... : 192.168.1.113 (l:13)
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Diameter Realm
>> ......... : hs.cini-clearwater.com (l:22)
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Tc Timer
>> ............... : 30
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Tw Timer
>> ............... : 30
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Local port
>> ............. : 3868
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Local secure port
>> ...... : 5658
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Number of SCTP streams
>> . : 3
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Number of clients thr
>> .. : 5
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Number of app threads
>> .. : 4
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Local endpoints
>> ........ : Default (use all available)
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Local applications
>> ..... : (none)
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Flags : - IP
>> ........... : Enabled
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - IPv6
>> ......... : Enabled
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - Relay app
>> .... : Enabled
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - TCP
>> .......... : Enabled
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - SCTP
>> ......... : Enabled
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - Pref. proto
>> .. : SCTP
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - TLS method
>> ... : Separate port
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   TLS :   - Certificate
>> .. : /var/lib/homestead/cert.pem
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - Private key
>> .. : /var/lib/homestead/privkey.pem
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - CA (trust)
>> ... : /var/lib/homestead/ca.pem (1 certs)
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - CRL
>> .......... : (none)
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - Priority
>> ..... : (default: 'NORMAL')
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:           - DH bits
>> ...... : 1024
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:   Origin-State-Id
>> ........ : 1417187628
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions:
>> '/usr/share/clearwater/homestead/lib/freeDiameter//dbg_monitor.fdx'[(no
>> config file)], loaded
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions:
>> '/usr/share/clearwater/homestead/lib/freeDiameter//dict_nasreq.fdx'[(no
>> config file)], loaded
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions:
>> '/usr/share/clearwater/homestead/lib/freeDiameter//dict_sip.fdx'[(no config
>> file)], loaded
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions:
>> '/usr/share/clearwater/homestead/lib/freeDiameter//dict_dcca.fdx'[(no
>> config file)], loaded
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter: Loaded extensions:
>> '/usr/share/clearwater/homestead/lib/freeDiameter//dict_dcca_3gpp.fdx'[(no
>> config file)], loaded
>> 28-11-2014 15:13:48.281 UTC Status freeDiameter:
>> {signal:12}'dbg_monitor'->0x7faf1c2ebd70
>> 28-11-2014 15:13:48.281 UTC Status diameterstack.cpp:432: Starting
>> Diameter stack
>> 28-11-2014 15:13:48.288 UTC Status freeDiameter: Local server
>> address(es): 192.168.1.113{---L-}
>> 28-11-2014 15:13:48.288 UTC Status handlers.cpp:76: Configuring
>> HssCacheTask
>> 28-11-2014 15:13:48.288 UTC Status handlers.cpp:77:   Dest-Realm:
>> cini-clearwater.com
>> 28-11-2014 15:13:48.288 UTC Status handlers.cpp:78:   Dest-Host:
>> 28-11-2014 15:13:48.288 UTC Status handlers.cpp:79:   Server-Name:
>> sip:sprout.cini-clearwater.com:5054;transport=TCP
>> 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:131: Configuring HTTP
>> stack
>> 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:132:   Bind address:
>> 192.168.1.113
>> 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:133:   Bind port:    8888
>> 28-11-2014 15:13:48.288 UTC Status httpstack.cpp:134:   Num threads:  50
>> 28-11-2014 15:13:48.547 UTC Status main.cpp:597: Start-up complete - wait
>> for termination signal
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 2014-11-28 10:36 GMT+01:00 Antonio Ken Iannillo <[email protected]>:
>>
>> I checked the bono node and I found in its log:
>>
>> UTC Error connection_pool.cpp:217: Failed to resolve
>> sprout.cini-clearwater.com to an IP address - Not found (PJ_ENOTFOUND)
>>
>>
>>
>> But I can ping all the other machines in the deployment with their
>> symbolic name (even sprout.cini-clearwater.com) so I cant' understand
>> this error.
>>
>>
>>
>> Any idea?
>>
>>
>>
>> Antonio Ken Iannillo
>>
>>
>>
>>
>>
>> 2014-11-27 19:53 GMT+01:00 Eleanor Merry <[email protected]>:
>>
>> Hi,
>>
>> The stress tests shouldn't time out like that. Can you see if either of
>> your Bono's receives the REGISTERs? If they don't then can you check that
>> the stress node can resolve the Bono address (on a sip stress node, this
>> will be either the home_domain, or (one of the) bono_servers set in
>> /etc/clearwater/config), and that the Bono nodes will accept traffic from
>> the stress nodes.
>>
>> Thanks for the documentation pointer - I'll update this.
>>
>> Ellie
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:
>> [email protected]] On Behalf Of Antonio Ken
>> Iannillo
>> Sent: 26 November 2014 18:43
>> To: [email protected]
>> Subject: [Clearwater] Connection Timeout on Large Scale Deployment
>>
>> Hi all,
>>
>> I manually installed a larger scale deployment (2 copies of every service
>> but ellis) and I run the stress sipp test.
>>
>> I got all timeouts, but I didn't get them with a simple deployment. Do
>> you think it depends on a greater network delay or maybe i could have
>> mis-configured something?
>>
>> In sipp error logs I got something like:
>>
>> '.
>> > 2014-11-26      19:27:16:661    1417026436.661073: Aborting call on
>> > unexpected message for Call-Id '[email protected]': while
>> > expecting '401' (index 3), received 'SIP/2.0 408 Request Timeout
>> > Via: SIP/2.0/TCP 192.168.1.107:5060
>> > ;rport=24582;received=192.168.1.107;branch=z9hG4bK-9941-17-2-201000016
>> > 6-
>> > Call-ID: 2010000166///[email protected]
>> > From: <sip:[email protected]>;tag=9941SIPpTag0017
>> > To:
>> > <sip:[email protected]>;tag=z9hG4bK-9941-17-2-2010000166-
>> > CSeq: 1 REGISTER
>> > Content-Length:  0
>> > '.
>>
>>
>> while in the sipp message log i have:
>>
>> Problem EAGAIN on socket  12 and poll_idx  is 2
>> > Added first buffered message to socket 12 Problem EWOULDBLOCK on
>> > socket  12 and poll_idx  is 2 Exit problem event on socket 12 Wrote
>> > 702 of 702 bytes in an output buffer.
>> > ----------------------------------------------- 2014-11-26
>> > 19:27:11:27.944 TCP message received [363] bytes :
>> > SIP/2.0 408 Request Timeout
>> > Via: SIP/2.0/TCP 192.168.1.107:5060
>> > ;rport=24578;received=192.168.1.107;branch=z9hG4bK-9941-23-2-201000015
>> > 4-
>> > Call-ID: 2010000154///[email protected]
>> > From: <sip:[email protected]>;tag=9941SIPpTag0023
>> > To:
>> > <sip:[email protected]>;tag=z9hG4bK-9941-23-2-2010000154-
>> > CSeq: 1 REGISTER
>> > Content-Length:  0
>> >
>> > -----------------------------------------------
>> > Unexpected TCP message received:
>> > SIP/2.0 408 Request Timeout
>> > Via: SIP/2.0/TCP 192.168.1.107:5060
>> > ;rport=24578;received=192.168.1.107;branch=z9hG4bK-9941-23-2-201000015
>> > 4-
>> > Call-ID: 2010000154///[email protected]
>> > From: <sip:[email protected]>;tag=9941SIPpTag0023
>> > To:
>> > <sip:[email protected]>;tag=z9hG4bK-9941-23-2-2010000154-
>> > CSeq: 1 REGISTER
>> > Content-Length:  0
>>
>>
>> I tried to increase send and receive timeout in the sipp command but the
>> results didn't change.
>>
>> P.S.: Feedback on documentation --> in the description of large scale
>> deployment, while clustering homestead one should unistall both homestead
>> and homestead-prov in order to inject both schemas in cassandra. In the doc
>> only homestead is specified.
>>
>> _______________________________________________
>> Clearwater mailing list
>> [email protected]
>> http://lists.projectclearwater.org/listinfo/clearwater
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/listinfo/clearwater

Reply via email to