Hello Ellie,

At the mean time when the Blink client can do REGISTRATION successfully, the 502 Bad Gateway error occurred at the beginning of the live-test: ubuntu@testnode:~/clearwater-live-test$ rake test[clearwater.demo] SIGNUP_CODE=secret TESTS="*Registration"
Basic Registration (TCP) - Failed
  RestClient::BadGateway thrown:
   - 502 Bad Gateway
- /home/ubuntu/.rvm/gems/ruby-1.9.3-p545/gems/rest-client-1.6.7/lib/restclient/abstract_response.rb:48:in `return!' - /home/ubuntu/.rvm/gems/ruby-1.9.3-p545/gems/rest-client-1.6.7/lib/restclient/request.rb:230:in `process_result'

*The ellis-0.log recorded the following error, which suggested that some numbers are created (e.g. 6505550947), but some are not (e.g. 6505550991). Ellis log recorded the error "max_clients limit reached".**

For the created number, homer node seems to be the trouble one giving 599 time-out. For the non-created number, homestead seems to be the trouble one giving 404 not-found (matches your speculation that ellis thought the number is created, but homestead thinks it is not).

So I created 1000 new numbers on ellis and repeated the live-test, which still failed at the beginning with 502 Bad Gateway. I noticed a new number in ellis log file, 6505551212, the request to which gets 200 OK from homestead but 599 from homer.* *The following is ellis log trace:*

2014-04-03 17:15:58,265 D numbers:128 25216:139677127358208] SIP URI sip:*6505550947*@clearwater.demo

2014-04-03 17:15:58,319 D utils:79 25216:139677127358208] *OK HTTP response*. HTTPResponse(code=200,request_time=0.016737937927246094,buffer=<_io.BytesIO object at 0x3530950>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest object at 0x387f850>,effective_url='*http://hs.clearwater.demo:8889*/private/*6505550947*%40clearwater.demo/associated_implicit_registration_sets/6ed5a607-3d94-4719-8cc3-2a2f0b1bb92b',headers={'Content-Length': '0', 'Content-Type': 'text/html; charset=UTF-8', 'Server': 'cyclone/1.0'},error=None)

2014-04-03 17:16:18,413 W utils:73 25216:139677127358208]*Non-OK HTTP response*. HTTPResponse(code=599,request_time=20.019418954849243,buffer=None,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest object at 0x39c85d0>,effective_url='*http://homer.clearwater.demo:7888*/org.etsi.ngn.simservs/users/sip%3A*6505550947*%40clearwater.demo/simservs.xml',headers={},error=HTTPError('*HTTP 599: Timeout*',)) 2014-04-03 17:16:18,414 W numbers:192 25216:139677127358208] Failed to update all the backends

2014-04-03 17:16:39,672 D simple_httpclient:107 25216:139677127358208] *max_clients limit reached, request queued. 10 active, 95 queued requests.*

2014-04-03 17:16:40,326 W utils:73 25216:139677127358208] Non-OK HTTP response. HTTPResponse(code=404,request_time=0.03327202796936035,buffer=<_io.BytesIO object at 0x36333b0>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest object at 0x363a4d0>,effective_url='http://*hs.clearwater.demo:8889*/public/sip%3A*6505550991*%40clearwater.demo/associated_private_ids',headers={'Content-Length': '95', 'Server': 'cyclone/1.0', 'Content-Type': 'application/json', 'Content-Encoding': 'gzip'},error=HTTPError('*HTTP 404: Not Found*',))

2014-04-03 17:39:08,169 D utils:79 25216:139677127358208] *OK HTTP response.* HTTPResponse(code=200,request_time=0.012949943542480469,buffer=<_io.BytesIO object at 0x36335f0>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest object at 0x36185d0>,effective_url='http://*hs.clearwater.demo:8889*/private/*6505551212*%40clearwater.demo/associated_implicit_registration_sets/412252eb-a12a-4812-aff1-2cb978866791',headers={'Content-Length': '0', 'Content-Type': 'text/html; charset=UTF-8', 'Server': 'cyclone/1.0'},error=None)

2014-04-03 17:39:28,242 W utils:73 25216:139677127358208] *Non-OK HTTP response.* HTTPResponse(code=599,request_time=20.01988410949707,buffer=None,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest object at 0x3618350>,effective_url='*http://homer.clearwater.demo:7888*/org.etsi.ngn.simservs/users/sip%3A*6505551212*%40clearwater.demo/simservs.xml',headers={},error=HTTPError('*HTTP 599: Timeout*',))

*Since the ellis log pointed out the time-out when sending requests to homer, **I then checked homer node (sudo monit status give all OK status, nslookup check suggests all hostname resolutions are correct). Then I checked homer-0.log on, and found the following (10.8.0.16 is the ellis node):*

2014-04-03 17:15:18,277 I base:245 24493:140443843479296] Received *request from 10.8.0.16 - PUT* http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml 2014-04-03 17:15:18,277 I xsd:76 24493:140443843479296] Performing XSD validation 2014-04-03 17:15:38,056 I base:245 24493:140443843479296] Received *request from 10.8.0.16 - DELETE *http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml 2014-04-03 17:15:58,642 I base:245 24493:140443843479296] Received *request from 10.8.0.16 - PUT *http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml 2014-04-03 17:15:58,643 I xsd:76 24493:140443843479296] Performing XSD validation 2014-04-03 17:16:18,766 I base:245 24493:140443843479296] Received *request from 10.8.0.16 - DELETE *http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml

*I didn't see the request ever being returned. Hence I expect it could be the same problem as why homestead did not respond to sprout's REGISTER queries - cassandra dead. And I found it is the case, as cassandra log records a number of exceptions:** **Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "CompactionExecutor:34"**
**
**So I re-started cassandra, then homer (the same treatment you instructed me to do with homestead). And live-test can now get pass result for REGISTER.* *

Though the problem is solved, I'm curious how the cassandra could run out of memory (for homestead last time with Blink client REGISTER, for homer this time with live-test sipp scripts). Do you have some clue about how it happens so frequent and how to prevent it from happening so often?*

Regards,
Lilin

On 03/04/2014 12:12 PM, Eleanor Merry wrote:

Hi Lilin,

It's good to hear that the REGISTERs are now succeeding!

How are the live tests failing? Are you seeing something like the output below?

Basic Registration (TCP) - (2010000098) Passed

Basic Registration (UDP) - (2010000098) Passed

ISC Interface - Third-party Registration (TCP) - Skipped (No hostname given)

- Call with HOSTNAME=<publicly accessible hostname/IP of this machine>

ISC Interface - Third-party Registration (UDP) - Skipped (No hostname given)

- Call with HOSTNAME=<publicly accessible hostname/IP of this machine>

ISC Interface - Third-party Registration - implicit registration (TCP) - Skipped (No hostname given)

- Call with HOSTNAME=<publicly accessible hostname/IP of this machine>

ISC Interface - Third-party Registration - implicit registration (UDP) - Skipped (No hostname given)

- Call with HOSTNAME=<publicly accessible hostname/IP of this machine>

rake aborted!

502 Bad Gateway

/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/abstract_response.rb:48:in `return!'

/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:230:in `process_result'

/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:178:in `block in transmit'

/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:172:in `transmit'

/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:64:in `execute'

/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:33:in `execute'

/home/ubuntu/clearwater-live-test/lib/live-test.rb:71:in `destroy_leaked_numbers'

/home/ubuntu/clearwater-live-test/lib/live-test.rb:52:in `run_tests'

/home/ubuntu/clearwater-live-test/Rakefile:39:in `block in <top (required)>'

/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/bin/ruby_noexec_wrapper:14:in `eval'

/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/bin/ruby_noexec_wrapper:14:in `<main>'

Tasks: TOP => test

(See full trace by running task with --trace)

In this case, the 502 bad gateway error at the end is benign, and the basic registration tests have passed.

This error occurs during the tidy-up after the tests, and it's caused by Ellis thinking some numbers are provisioned, and homestead-prov thinking that they aren't. Homestead responds to queries about these numbers with a 404 (which you've seen in the Ellis logs), which causes Ellis to respond with a 502. This issue is being tracked at https://github.com/Metaswitch/ellis/issues/49.

If this isn't what you're seeing, then can you let me know the errors reported by the Basic Registration tests (they will either print an exception to screen, or print the log file used for the tests). Can you also see if the homestead logs are reporting problems?

Thanks,

Ellie

*From:*Lilin Zhang [mailto:[email protected]]
*Sent:* 03 April 2014 00:19
*To:* Eleanor Merry; James Coleman
*Cc:* [email protected]
*Subject:* Blink client making voice calls, BUT clearwater-live-test failing

Hello Ellie,

Thanks for the help here! I started cassandra and homestead. The 403 Forbidden error to REGISTER is gone. The blink clients can register, and make voice calls.

Then I started to think about using sipp scripts to test the deployment. I believe clearwater-live-test package can do it. So I installed live-test on one VM, testnode, and ran the rake test[clearwater.demo] SIGNUP_CODE=secret TESTS="*Registration".

I got the 502 Bad Gateway error from the testnode. I took a look at the ellis log file, and found that ellis got the 404 error from homestead. For example:

2014-04-02 22:56:24,979 W utils:73 25216:139677127358208] Non-OK HTTP response. HTTPResponse(code=404,request_time=0.0420069694519043,buffer=<_io.BytesIO object at 0x39b23b0>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest object at 0x39d6750>,effective_url='http://hs.clearwater.demo:8889/public/sip%3A6505550120%40clearwater.demo/associated_private_ids',headers= <http://hs.clearwater.demo:8889/public/sip%3A6505550120%40clearwater.demo/associated_private_ids%27,headers=>{'Content-Length': '95', 'Server': 'cyclone/1.0', 'Content-Type': 'application/json', 'Content-Encoding': 'gzip'},error=HTTPError('HTTP 404: Not Found',))

But with the live-test failing, at the same time I can use Blink client to REGISTER, and to make the voice calls. The successful working of Blink client suggests that all clearwater components work fine. But the failing live-test suggests the other way. How does this happen?

Regards,
Lilin


_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/listinfo/clearwater

Reply via email to