Hello Ellie,
At the mean time when the Blink client can do REGISTRATION successfully,
the 502 Bad Gateway error occurred at the beginning of the live-test:
ubuntu@testnode:~/clearwater-live-test$ rake test[clearwater.demo]
SIGNUP_CODE=secret TESTS="*Registration"
Basic Registration (TCP) - Failed
RestClient::BadGateway thrown:
- 502 Bad Gateway
-
/home/ubuntu/.rvm/gems/ruby-1.9.3-p545/gems/rest-client-1.6.7/lib/restclient/abstract_response.rb:48:in
`return!'
-
/home/ubuntu/.rvm/gems/ruby-1.9.3-p545/gems/rest-client-1.6.7/lib/restclient/request.rb:230:in
`process_result'
*The ellis-0.log recorded the following error, which suggested that some
numbers are created (e.g. 6505550947), but some are not (e.g.
6505550991). Ellis log recorded the error "max_clients limit reached".**
For the created number, homer node seems to be the trouble one giving
599 time-out. For the non-created number, homestead seems to be the
trouble one giving 404 not-found (matches your speculation that ellis
thought the number is created, but homestead thinks it is not).
So I created 1000 new numbers on ellis and repeated the live-test, which
still failed at the beginning with 502 Bad Gateway. I noticed a new
number in ellis log file, 6505551212, the request to which gets 200 OK
from homestead but 599 from homer.* *The following is ellis log trace:*
2014-04-03 17:15:58,265 D numbers:128 25216:139677127358208] SIP URI
sip:*6505550947*@clearwater.demo
2014-04-03 17:15:58,319 D utils:79 25216:139677127358208] *OK HTTP
response*.
HTTPResponse(code=200,request_time=0.016737937927246094,buffer=<_io.BytesIO
object at
0x3530950>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest
object at
0x387f850>,effective_url='*http://hs.clearwater.demo:8889*/private/*6505550947*%40clearwater.demo/associated_implicit_registration_sets/6ed5a607-3d94-4719-8cc3-2a2f0b1bb92b',headers={'Content-Length':
'0', 'Content-Type': 'text/html; charset=UTF-8', 'Server':
'cyclone/1.0'},error=None)
2014-04-03 17:16:18,413 W utils:73 25216:139677127358208]*Non-OK HTTP
response*.
HTTPResponse(code=599,request_time=20.019418954849243,buffer=None,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest
object at
0x39c85d0>,effective_url='*http://homer.clearwater.demo:7888*/org.etsi.ngn.simservs/users/sip%3A*6505550947*%40clearwater.demo/simservs.xml',headers={},error=HTTPError('*HTTP
599: Timeout*',))
2014-04-03 17:16:18,414 W numbers:192 25216:139677127358208] Failed to
update all the backends
2014-04-03 17:16:39,672 D simple_httpclient:107 25216:139677127358208]
*max_clients limit reached, request queued. 10 active, 95 queued requests.*
2014-04-03 17:16:40,326 W utils:73 25216:139677127358208] Non-OK HTTP
response.
HTTPResponse(code=404,request_time=0.03327202796936035,buffer=<_io.BytesIO
object at
0x36333b0>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest
object at
0x363a4d0>,effective_url='http://*hs.clearwater.demo:8889*/public/sip%3A*6505550991*%40clearwater.demo/associated_private_ids',headers={'Content-Length':
'95', 'Server': 'cyclone/1.0', 'Content-Type': 'application/json',
'Content-Encoding': 'gzip'},error=HTTPError('*HTTP 404: Not Found*',))
2014-04-03 17:39:08,169 D utils:79 25216:139677127358208] *OK HTTP
response.*
HTTPResponse(code=200,request_time=0.012949943542480469,buffer=<_io.BytesIO
object at
0x36335f0>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest
object at
0x36185d0>,effective_url='http://*hs.clearwater.demo:8889*/private/*6505551212*%40clearwater.demo/associated_implicit_registration_sets/412252eb-a12a-4812-aff1-2cb978866791',headers={'Content-Length':
'0', 'Content-Type': 'text/html; charset=UTF-8', 'Server':
'cyclone/1.0'},error=None)
2014-04-03 17:39:28,242 W utils:73 25216:139677127358208] *Non-OK HTTP
response.*
HTTPResponse(code=599,request_time=20.01988410949707,buffer=None,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest
object at
0x3618350>,effective_url='*http://homer.clearwater.demo:7888*/org.etsi.ngn.simservs/users/sip%3A*6505551212*%40clearwater.demo/simservs.xml',headers={},error=HTTPError('*HTTP
599: Timeout*',))
*Since the ellis log pointed out the time-out when sending requests to
homer, **I then checked homer node (sudo monit status give all OK
status, nslookup check suggests all hostname resolutions are correct).
Then I checked homer-0.log on, and found the following (10.8.0.16 is the
ellis node):*
2014-04-03 17:15:18,277 I base:245 24493:140443843479296] Received
*request from 10.8.0.16 - PUT*
http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml
2014-04-03 17:15:18,277 I xsd:76 24493:140443843479296] Performing XSD
validation
2014-04-03 17:15:38,056 I base:245 24493:140443843479296] Received
*request from 10.8.0.16 - DELETE
*http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml
2014-04-03 17:15:58,642 I base:245 24493:140443843479296] Received
*request from 10.8.0.16 - PUT
*http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml
2014-04-03 17:15:58,643 I xsd:76 24493:140443843479296] Performing XSD
validation
2014-04-03 17:16:18,766 I base:245 24493:140443843479296] Received
*request from 10.8.0.16 - DELETE
*http://homer.clearwater.demo:7888/org.etsi.ngn.simservs/users/sip%3A6505550947%40clearwater.demo/simservs.xml
*I didn't see the request ever being returned. Hence I expect it could
be the same problem as why homestead did not respond to sprout's
REGISTER queries - cassandra dead. And I found it is the case, as
cassandra log records a number of exceptions:**
**Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "CompactionExecutor:34"**
**
**So I re-started cassandra, then homer (the same treatment you
instructed me to do with homestead). And live-test can now get pass
result for REGISTER.* *
Though the problem is solved, I'm curious how the cassandra could run
out of memory (for homestead last time with Blink client REGISTER, for
homer this time with live-test sipp scripts). Do you have some clue
about how it happens so frequent and how to prevent it from happening so
often?*
Regards,
Lilin
On 03/04/2014 12:12 PM, Eleanor Merry wrote:
Hi Lilin,
It's good to hear that the REGISTERs are now succeeding!
How are the live tests failing? Are you seeing something like the
output below?
Basic Registration (TCP) - (2010000098) Passed
Basic Registration (UDP) - (2010000098) Passed
ISC Interface - Third-party Registration (TCP) - Skipped (No hostname
given)
- Call with HOSTNAME=<publicly accessible hostname/IP of this machine>
ISC Interface - Third-party Registration (UDP) - Skipped (No hostname
given)
- Call with HOSTNAME=<publicly accessible hostname/IP of this machine>
ISC Interface - Third-party Registration - implicit registration (TCP)
- Skipped (No hostname given)
- Call with HOSTNAME=<publicly accessible hostname/IP of this machine>
ISC Interface - Third-party Registration - implicit registration (UDP)
- Skipped (No hostname given)
- Call with HOSTNAME=<publicly accessible hostname/IP of this machine>
rake aborted!
502 Bad Gateway
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/abstract_response.rb:48:in
`return!'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:230:in
`process_result'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:178:in
`block in transmit'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:172:in
`transmit'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:64:in
`execute'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/gems/rest-client-1.6.7/lib/restclient/request.rb:33:in
`execute'
/home/ubuntu/clearwater-live-test/lib/live-test.rb:71:in
`destroy_leaked_numbers'
/home/ubuntu/clearwater-live-test/lib/live-test.rb:52:in `run_tests'
/home/ubuntu/clearwater-live-test/Rakefile:39:in `block in <top
(required)>'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/bin/ruby_noexec_wrapper:14:in
`eval'
/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/bin/ruby_noexec_wrapper:14:in
`<main>'
Tasks: TOP => test
(See full trace by running task with --trace)
In this case, the 502 bad gateway error at the end is benign, and the
basic registration tests have passed.
This error occurs during the tidy-up after the tests, and it's caused
by Ellis thinking some numbers are provisioned, and homestead-prov
thinking that they aren't. Homestead responds to queries about these
numbers with a 404 (which you've seen in the Ellis logs), which causes
Ellis to respond with a 502. This issue is being tracked at
https://github.com/Metaswitch/ellis/issues/49.
If this isn't what you're seeing, then can you let me know the errors
reported by the Basic Registration tests (they will either print an
exception to screen, or print the log file used for the tests). Can
you also see if the homestead logs are reporting problems?
Thanks,
Ellie
*From:*Lilin Zhang [mailto:[email protected]]
*Sent:* 03 April 2014 00:19
*To:* Eleanor Merry; James Coleman
*Cc:* [email protected]
*Subject:* Blink client making voice calls, BUT clearwater-live-test
failing
Hello Ellie,
Thanks for the help here! I started cassandra and homestead. The 403
Forbidden error to REGISTER is gone. The blink clients can register,
and make voice calls.
Then I started to think about using sipp scripts to test the
deployment. I believe clearwater-live-test package can do it. So I
installed live-test on one VM, testnode, and ran the rake
test[clearwater.demo] SIGNUP_CODE=secret TESTS="*Registration".
I got the 502 Bad Gateway error from the testnode. I took a look at
the ellis log file, and found that ellis got the 404 error from
homestead. For example:
2014-04-02 22:56:24,979 W utils:73 25216:139677127358208] Non-OK HTTP
response.
HTTPResponse(code=404,request_time=0.0420069694519043,buffer=<_io.BytesIO
object at
0x39b23b0>,_body=None,time_info={},request=<tornado.httpclient.HTTPRequest
object at
0x39d6750>,effective_url='http://hs.clearwater.demo:8889/public/sip%3A6505550120%40clearwater.demo/associated_private_ids',headers=
<http://hs.clearwater.demo:8889/public/sip%3A6505550120%40clearwater.demo/associated_private_ids%27,headers=>{'Content-Length':
'95', 'Server': 'cyclone/1.0', 'Content-Type': 'application/json',
'Content-Encoding': 'gzip'},error=HTTPError('HTTP 404: Not Found',))
But with the live-test failing, at the same time I can use Blink
client to REGISTER, and to make the voice calls. The successful
working of Blink client suggests that all clearwater components work
fine. But the failing live-test suggests the other way. How does this
happen?
Regards,
Lilin
_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/listinfo/clearwater