Hi Lilin,

I'm glad to hear that the live-tests are now passing.
We're tracking the Cassandra issue at 
(https://github.com/Metaswitch/crest/issues/148). Can you let us know if you 
see this again - particularly if you see it on both Homestead and Homer?

Ellie

From: Lilin Zhang [mailto:[email protected]]
Sent: 03 April 2014 19:07
To: Eleanor Merry
Cc: [email protected]
Subject: Re: [solved but having new questions] Blink client making voice calls, 
BUT clearwater-live-test failing

Hello Ellie,

At the mean time when the Blink client can do REGISTRATION successfully, the 
502 Bad Gateway error occurred at the beginning of the live-test:
ubuntu@testnode:~/clearwater-live-test$ rake test[clearwater.demo] 
SIGNUP_CODE=secret TESTS="*Registration"
Basic Registration (TCP) - Failed
  RestClient::BadGateway thrown:
   - 502 Bad Gateway
     - 
/home/ubuntu/.rvm/gems/ruby-1.9.3-p545/gems/rest-client-1.6.7/lib/restclient/abstract_response.rb:48:in
 `return!'
     - 
/home/ubuntu/.rvm/gems/ruby-1.9.3-p545/gems/rest-client-1.6.7/lib/restclient/request.rb:230:in
 `process_result'

The ellis-0.log recorded the following error, which suggested that some numbers 
are created (e.g. 6505550947), but some are not (e.g. 6505550991). Ellis log 
recorded the error "max_clients limit reached".

For the created number, homer node seems to be the trouble one giving 599 
time-out. For the non-created number, homestead seems to be the trouble one 
giving 404 not-found (matches your speculation that ellis thought the number is 
created, but homestead thinks it is not).

So I created 1000 new numbers on ellis and repeated the live-test, which still 
failed at the beginning with 502 Bad Gateway. I noticed a new number in ellis 
log file, 6505551212, the request to which gets 200 OK from homestead but 599 
from homer. The following is ellis log trace:

2014-04-03 17:15:58,265 D numbers:128 25216:139677127358208] SIP URI 
sip:[email protected]

2014-04-03 17:15:58,319 D utils:79 25216:139677127358208] OK HTTP response. 
HTTPResponse(code=200,request_time=0.016737937927246094,buffer=<_io.BytesIO 
object at 
0x3530950>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest 
object at 
0x387f850>,effective_url='http://hs.clearwater.demo:8889/private/6505550947%40clearwater.demo/associated_implicit_registration_sets/6ed5a607-3d94-4719-8cc3-2a2f0b1bb92b',headers={'Content-Length':
 '0', 'Content-Type': 'text/html; charset=UTF-8', 'Server': 
'cyclone/1.0'},error=None)

2014-04-03 17:16:18,413 W utils:73 25216:139677127358208] Non-OK HTTP response. 
HTTPResponse(code=599,request_time=20.019418954849243,buffer=None,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest
 object at 
0x39c85d0>,effective_url='http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml',headers={},error=HTTPError('HTTP
 599: Timeout',))
2014-04-03 17:16:18,414 W numbers:192 25216:139677127358208] Failed to update 
all the backends

2014-04-03 17:16:39,672 D simple_httpclient:107 25216:139677127358208] 
max_clients limit reached, request queued. 10 active, 95 queued requests.

2014-04-03 17:16:40,326 W utils:73 25216:139677127358208] Non-OK HTTP response. 
HTTPResponse(code=404,request_time=0.03327202796936035,buffer=<_io.BytesIO 
object at 
0x36333b0>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest 
object at 
0x363a4d0>,effective_url='http://hs.clearwater.demo:8889/public/sip%3A6505550991%40clearwater.demo/associated_private_ids',headers={'Content-Length':
 '95', 'Server': 'cyclone/1.0', 'Content-Type': 'application/json', 
'Content-Encoding': 'gzip'},error=HTTPError('HTTP 404: Not Found',))

2014-04-03 17:39:08,169 D utils:79 25216:139677127358208] OK HTTP response. 
HTTPResponse(code=200,request_time=0.012949943542480469,buffer=<_io.BytesIO 
object at 
0x36335f0>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest 
object at 
0x36185d0>,effective_url='http://hs.clearwater.demo:8889/private/6505551212%40clearwater.demo/associated_implicit_registration_sets/412252eb-a12a-4812-aff1-2cb978866791',headers={'Content-Length':
 '0', 'Content-Type': 'text/html; charset=UTF-8', 'Server': 
'cyclone/1.0'},error=None)

2014-04-03 17:39:28,242 W utils:73 25216:139677127358208] Non-OK HTTP response. 
HTTPResponse(code=599,request_time=20.01988410949707,buffer=None,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest
 object at 
0x3618350>,effective_url='http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505551212%40clearwater.demo/simservs.xml',headers={},error=HTTPError('HTTP
 599: Timeout',))

Since the ellis log pointed out the time-out when sending requests to homer,  I 
then checked homer node (sudo monit status give all OK status, nslookup check 
suggests all hostname resolutions are correct). Then I checked homer-0.log on, 
and found the following (10.8.0.16 is the ellis node):

2014-04-03 17:15:18,277 I base:245 24493:140443843479296] Received request from 
10.8.0.16 - PUT 
http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml
2014-04-03 17:15:18,277 I xsd:76 24493:140443843479296] Performing XSD 
validation
2014-04-03 17:15:38,056 I base:245 24493:140443843479296] Received request from 
10.8.0.16 - DELETE 
http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml
2014-04-03 17:15:58,642 I base:245 24493:140443843479296] Received request from 
10.8.0.16 - PUT 
http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml
2014-04-03 17:15:58,643 I xsd:76 24493:140443843479296] Performing XSD 
validation
2014-04-03 17:16:18,766 I base:245 24493:140443843479296] Received request from 
10.8.0.16 - DELETE 
http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml

I didn't see the request ever being returned. Hence I expect it could be the 
same problem as why homestead did not respond to sprout's REGISTER queries - 
cassandra dead. And I found it is the case, as cassandra log records a number 
of exceptions:
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler 
in thread "CompactionExecutor:34"

So I re-started cassandra, then homer (the same treatment you instructed me to 
do with homestead). And live-test can now get pass result for REGISTER.

Though the problem is solved, I'm curious how the cassandra could run out of 
memory (for homestead last time with Blink client REGISTER, for homer this time 
with live-test sipp scripts). Do you have some clue about how it happens so 
frequent and how to prevent it from happening so often?

Regards,
Lilin

On 03/04/2014 12:12 PM, Eleanor Merry wrote:
Hi Lilin,

It's good to hear that the REGISTERs are now succeeding!

How are the live tests failing? Are you seeing something like the output below?

Basic Registration (TCP) - (2010000098) Passed
Basic Registration (UDP) - (2010000098) Passed
ISC Interface - Third-party Registration (TCP) - Skipped (No hostname given)
   - Call with HOSTNAME=<publicly accessible hostname/IP of this machine>
ISC Interface - Third-party Registration (UDP) - Skipped (No hostname given)
   - Call with HOSTNAME=<publicly accessible hostname/IP of this machine>
ISC Interface - Third-party Registration - implicit registration (TCP) - 
Skipped (No hostname given)
   - Call with HOSTNAME=<publicly accessible hostname/IP of this machine>
ISC Interface - Third-party Registration - implicit registration (UDP) - 
Skipped (No hostname given)
   - Call with HOSTNAME=<publicly accessible hostname/IP of this machine>
rake aborted!
502 Bad Gateway
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/abstract_response.rb:48:in
 `return!'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:230:in
 `process_result'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:178:in
 `block in transmit'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:172:in
 `transmit'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:64:in
 `execute'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:33:in
 `execute'
/home/ubuntu/clearwater-live-test/lib/live-test.rb:71:in 
`destroy_leaked_numbers'
/home/ubuntu/clearwater-live-test/lib/live-test.rb:52:in `run_tests'
/home/ubuntu/clearwater-live-test/Rakefile:39:in `block in <top (required)>'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/bin/ruby_noexec_wrapper:14:in `eval'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/bin/ruby_noexec_wrapper:14:in `<main>'
Tasks: TOP => test
(See full trace by running task with --trace)

In this case, the 502 bad gateway error at the end is benign, and the basic 
registration tests have passed.
This error occurs during the tidy-up after the tests, and it's caused by Ellis 
thinking some numbers are provisioned, and homestead-prov thinking that they 
aren't. Homestead responds to queries about these numbers with a 404 (which 
you've seen in the Ellis logs), which causes Ellis to respond with a 502. This 
issue is being tracked at https://github.com/Metaswitch/ellis/issues/49.

If this isn't what you're seeing, then can you let me know the errors reported 
by the Basic Registration tests (they will either print an exception to screen, 
or print the log file used for the tests). Can you also see if the homestead 
logs are reporting problems?

Thanks,

Ellie
From: Lilin Zhang [mailto:[email protected]]
Sent: 03 April 2014 00:19
To: Eleanor Merry; James Coleman
Cc: 
[email protected]<mailto:[email protected]>
Subject: Blink client making voice calls, BUT clearwater-live-test failing

Hello Ellie,

Thanks for the help here! I started cassandra and homestead. The 403 Forbidden 
error to REGISTER is gone. The blink clients can register, and make voice calls.

Then I started to think about using sipp scripts to test the deployment. I 
believe clearwater-live-test package can do it. So I installed live-test on one 
VM, testnode, and ran the rake test[clearwater.demo] SIGNUP_CODE=secret 
TESTS="*Registration".

I got the 502 Bad Gateway error from the testnode. I took a look at the ellis 
log file, and found that ellis got the 404 error from homestead. For example:

2014-04-02 22:56:24,979 W utils:73 25216:139677127358208] Non-OK HTTP response. 
HTTPResponse(code=404,request_time=0.0420069694519043,buffer=<_io.BytesIO 
object at 
0x39b23b0>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest 
object at 
0x39d6750>,effective_url='http://hs.clearwater.demo:8889/public/sip%3A6505550120%40clearwater.demo/associated_private_ids',headers=<http://hs.clearwater.demo:8889/public/sip%3A6505550120%40clearwater.demo/associated_private_ids%27,headers=>{'Content-Length':
 '95', 'Server': 'cyclone/1.0', 'Content-Type': 'application/json', 
'Content-Encoding': 'gzip'},error=HTTPError('HTTP 404: Not Found',))

But with the live-test failing, at the same time I can use Blink client to 
REGISTER, and to make the voice calls. The successful working of Blink client 
suggests that all clearwater components work fine. But the failing live-test 
suggests the other way. How does this happen?

Regards,
Lilin

_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/listinfo/clearwater

Reply via email to