Hi, Can you please send me an example of Bono restarting itself? Is monit restarting it because Bono is unresponsive, or does Bono crash and then monit restarts it?
If it’s the former, then can you please send me the monit log (in /var/log/monit.log)? If it’s the latter, then can you please send me the bono log of the crash (with the full stack trace if possible – you’ll need to install bono-dbg to produce this)? Thanks, Ellie From: Duan Jingpu [mailto:[email protected]] Sent: 21 December 2014 10:01 To: Eleanor Merry Subject: Re: [Clearwater] Please Help Me on Issues about Clearwater Sip Stress Test! Hi Eleanor: I will try to disable clearwater-diags-monitor. Thank you very much for your help. Some of my following experiments showed that Bono will also restart itself under load in a test environment, but less frequently. Are these two problems related? Thank you, Duan Jing On 19 December 2014 at 02:17, Eleanor Merry <[email protected]<mailto:[email protected]>> wrote: Hi Richard, It sounds like you could be hitting this issue - https://github.com/Metaswitch/clearwater-infrastructure/issues/96 - where the diags collection script causes the Sprout node to continually restart itself under load (this issue is exacerbated in a test deployment with a single Sprout, rather than a deployment with multiple Sprouts, and a more realistic load ramp-up). Can you please try disabling the clearwater-diags-monitor for now (service clearwater-diags-monitor stop)? Ellie -----Original Message----- From: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] On Behalf Of ??? Sent: 14 December 2014 18:13 To: [email protected]<mailto:[email protected]> Subject: [Clearwater] Please Help Me on Issues about Clearwater Sip Stress Test! Dear Clearwater Developers: Currently, I'm running clearwater sip stress test against a small clearwater cluster. I'm currently encountering some problems about the test load. Could you please help me explain my problem? My test environment consists of 4 VMs interconnected by a LAN network, running Bono, Sprout, Homestead and Homer respectively. I have another VM running the clearwater-sip-stress package. This additional VM generates sip traffic to the clearwater cluster. I provisioned Homestead and Homer node with 200000 subscribers. VMs running Bono and Sprout are configured with 2 cores and 4GB RAM. VMs running Homestead and Homer are configured with 4 cores and 8GB RAM because I notice that the cassandra service will consume a lot of memory and CPU resource. Currently, when sip stress node generates traffic for 80000 or 90000 subscribers following the steps in the clearwater doc. The whole cluster runs smoothly. But when sip stress node generates traffic for 100000 subscribers, the cluster can't operate normally. First, I can observe many errors from the call_load2_xxx_errors.log (where xxx is the current sipp pid). Second, the Sprout node will constantly restart itself because of signal 6 or signal 11. My problem is this: according to the performance page on the official website, it seems that each sprout node can handle 551k Subscribers and 500k BHCA, which is much larger than my current statistics. Is the phenomena that I encounter when I run stress test for 100000 subscribers caused by Sprout Node overloading or some other errors? Thank you very much! Richard Duan *This is the sprout log file before signal 11 happens:* 14-12-2014 16:16:36.224 UTC Warning scscfsproutlet.cpp:1472: Cannot determine charging role as no Route header, assume originating 14-12-2014 16:16:36.353 UTC Warning scscfsproutlet.cpp:1472: Cannot determine charging role as no Route header, assume originating 14-12-2014 16:16:36.682 UTC Warning scscfsproutlet.cpp:1472: Cannot determine charging role as no Route header, assume originating 14-12-2014 16:16:36.714 UTC Warning scscfsproutlet.cpp:1472: Cannot determine charging role as no Route header, assume originating 14-12-2014 16:16:36.717 UTC Warning scscfsproutlet.cpp:1472: Cannot determine charging role as no Route header, assume originating 14-12-2014 16:16:36.720 UTC Warning scscfsproutlet.cpp:1472: Cannot determine charging role as no Route header, assume originating 14-12-2014 16:16:36.802 UTC Warning scscfsproutlet.cpp:1472: Cannot determine charging role as no Route header, assume originating 14-12-2014 16:16:36.846 UTC Warning scscfsproutlet.cpp:1472: Cannot determine charging role as no Route header, assume originating 14-12-2014 16:16:37.372 UTC Warning scscfsproutlet.cpp:1472: Cannot determine charging role as no Route header, assume originating 14-12-2014 16:16:38.044 UTC Warning scscfsproutlet.cpp:1472: Cannot determine charging role as no Route header, assume originating 14-12-2014 16:16:57.299 UTC Error httpconnection.cpp:579: http://hs.cw.t:8888/impu/sip%3A2010075097%40cw.t/reg-data?private_id=2010075097%40cw.t failed at server 192.168.126.41 : Timeout was reached (28) : fatal 14-12-2014 16:25:28.642 UTC Error httpconnection.cpp:579: http://homer.cw.t:7888/org.etsi.ngn.simservs/users/sip%3A2010023992%40cw.t/simservs.xml failed at server 192.168.126.51 : Timeout was reached (28) : fatal 14-12-2014 16:25:28.704 UTC Error httpconnection.cpp:579: http://homer.cw.t:7888/org.etsi.ngn.simservs/users/sip%3A2010000650%40cw.t/simservs.xml failed at server 192.168.126.51 : Timeout was reached (28) : fatal 14-12-2014 16:25:29.643 UTC Error httpconnection.cpp:579: http://homer.cw.t:7888/org.etsi.ngn.simservs/users/sip%3A2010023992%40cw.t/simservs.xml failed at server 192.168.126.51 : Timeout was reached (28) : fatal 14-12-2014 16:25:29.643 UTC Error httpconnection.cpp:692: cURL failure with cURL error code 28 (see man 3 libcurl-errors) and HTTP error code 500 14-12-2014 16:25:29.705 UTC Error httpconnection.cpp:579: http://homer.cw.t:7888/org.etsi.ngn.simservs/users/sip%3A2010000650%40cw.t/simservs.xml failed at server 192.168.126.51 : Timeout was reached (28) : fatal 14-12-2014 16:25:29.705 UTC Error httpconnection.cpp:692: cURL failure with cURL error code 28 (see man 3 libcurl-errors) and HTTP error code 500 14-12-2014 16:25:29.857 UTC Error httpconnection.cpp:579: http://hs.cw.t:8888/impi/2010087933%40cw.t/av?impu=sip%3A2010087933%40cw.t failed at server 192.168.126.41 : Timeout was reached (28) : fatal 14-12-2014 16:25:29.900 UTC Error httpconnection.cpp:579: http://hs.cw.t:8888/impu/sip%3A2010063922%40cw.t/reg-data failed at server 192.168.126.41 : Timeout was reached (28) : fatal 14-12-2014 16:25:30.322 UTC Error httpconnection.cpp:579: http://homer.cw.t:7888/org.etsi.ngn.simservs/users/sip%3A2010038720%40cw.t/simservs.xml failed at server 192.168.126.51 : Timeout was reached (28) : fatal 14-12-2014 16:25:30.647 UTC Error httpconnection.cpp:579: http://homer.cw.t:7888/org.etsi.ngn.simservs/users/sip%3A2010023993%40cw.t/simservs.xml failed at server 192.168.126.51 : Timeout was reached (28) : fatal 14-12-2014 16:25:30.710 UTC Error httpconnection.cpp:579: http://homer.cw.t:7888/org.etsi.ngn.simservs/users/sip%3A2010000651%40cw.t/simservs.xml failed at server 192.168.126.51 : Timeout was reached (28) : fatal 14-12-2014 16:25:30.901 UTC Error httpconnection.cpp:579: http://hs.cw.t:8888/impu/sip%3A2010063922%40cw.t/reg-data failed at server 192.168.126.41 : Timeout was reached (28) : fatal 14-12-2014 16:25:30.901 UTC Error httpconnection.cpp:692: cURL failure with cURL error code 28 (see man 3 libcurl-errors) and HTTP error code 500 14-12-2014 16:25:30.905 UTC Error hssconnection.cpp:589: Could not get subscriber data from HSS 14-12-2014 16:25:31.323 UTC Error httpconnection.cpp:579: http://homer.cw.t:7888/org.etsi.ngn.simservs/users/sip%3A2010038720%40cw.t/simservs.xml failed at server 192.168.126.51 : Timeout was reached (28) : fatal 14-12-2014 16:25:31.324 UTC Error httpconnection.cpp:692: cURL failure with cURL error code 28 (see man 3 libcurl-errors) and HTTP error code 500 14-12-2014 16:25:31.538 UTC Error httpconnection.cpp:579: http://hs.cw.t:8888/impu/sip%3A2010059772%40cw.t/reg-data failed at server 192.168.126.41 : Timeout was reached (28) : fatal 14-12-2014 16:25:31.648 UTC Error httpconnection.cpp:579: http://homer.cw.t:7888/org.etsi.ngn.simservs/users/sip%3A2010023993%40cw.t/simservs.xml failed at server 192.168.126.51 : Timeout was reached (28) : fatal 14-12-2014 16:25:31.648 UTC Error httpconnection.cpp:692: cURL failure with cURL error code 28 (see man 3 libcurl-errors) and HTTP error code 500 Signal 11 caught Basic stack dump: /usr/share/clearwater/bin/ sprout(_ZN6Logger9backtraceEPKc+0x6d)[0x4a0b1d] /usr/share/clearwater/bin/sprout(_ZN3Log9backtraceEPKcz+0x10d)[0x55281d] /usr/share/clearwater/bin/sprout(_Z17exception_handleri+0x29)[0x5989d9] /lib/x86_64-linux-gnu/libc.so.6(+0x36150)[0x7fe417dd6150] /usr/share/clearwater/bin/sprout(_ZN16SproutletWrapper11rx_responseEP13pjsip_tx_datai+0x5d)[0x592a0d] /usr/share/clearwater/bin/sprout(_ZN14SproutletProxy6UASTsx11tx_responseEP16SproutletWrapperP13pjsip_tx_data+0xdb)[0x592c1b] /usr/share/clearwater/bin/sprout(_ZN16SproutletWrapper15process_actionsEb+0xa0)[0x5921c0] /usr/share/clearwater/bin/sprout(_ZN14SproutletProxy6UASTsx22on_new_client_responseEPN10BasicProxy6UACTsxEP13pjsip_tx_data+0xe6)[0x593b56] /usr/share/clearwater/bin/sprout(_ZN10BasicProxy6UACTsx12on_tsx_stateEP11pjsip_event+0x68f)[0x56de5f] /usr/share/clearwater/bin/sprout[0x5dbe64] /usr/share/clearwater/bin/sprout[0x5df5d7] /usr/share/clearwater/bin/sprout(pjsip_tsx_recv_msg+0xb1)[0x5dd08e] /usr/share/clearwater/bin/sprout[0x5db3ae] /usr/share/clearwater/bin/sprout(pjsip_endpt_process_rx_data+0x23b)[0x5c552c] /usr/share/clearwater/bin/sprout[0x4a27d1] /usr/share/clearwater/bin/sprout[0x5fa558] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7fe418978e9a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe417e9431d] Advanced stack dump (requires gdb): sh: 1: /usr/bin/gdb: not found gdb failed with return code 32512 14-12-2014 16:25:43.247 UTC Status main.cpp:1082: Access logging enabled to /var/log/sprout 14-12-2014 16:25:43.247 UTC Status main.cpp:1086: Log level set to 2 14-12-2014 16:25:43.247 UTC Status load_monitor.cpp:93: Constructing LoadMonitor 14-12-2014 16:25:43.247 UTC Status load_monitor.cpp:94: Target latency (usecs) : 100000 14-12-2014 16:25:43.247 UTC Status load_monitor.cpp:95: Max bucket size : 20 14-12-2014 16:25:43.247 UTC Status load_monitor.cpp:96: Initial token fill rate/s: 10.000000 14-12-2014 16:25:43.247 UTC Status load_monitor.cpp:97: Min token fill rate/s : 10.000000 14-12-2014 16:25:43.247 UTC Status dnscachedresolver.cpp:90: Creating Cached Resolver using server 127.0.0.1 14-12-2014 16:25:43.247 UTC Status sipresolver.cpp:59: Created SIP resolver 14-12-2014 16:25:43.284 UTC Status stack.cpp:690: Listening on port 5054 14-12-2014 16:25:43.287 UTC Status stack.cpp:1094: Local host aliases: 14-12-2014 16:25:43.287 UTC Status stack.cpp:1099: 192.168.124.51 14-12-2014 16:25:43.287 UTC Status stack.cpp:1099: sprout.cw.t 14-12-2014 16:25:43.287 UTC Status stack.cpp:1099: 192.168.124.51 14-12-2014 16:25:43.288 UTC Status httpresolver.cpp:50: Created HTTP resolver 14-12-2014 16:25:43.288 UTC Status main.cpp:1295: Creating connection to HSS hs.cw.t:8888 14-12-2014 16:25:43.291 UTC Status httpconnection.cpp:119: HttpConnection for server hs.cw.t:8888 14-12-2014 16:25:43.291 UTC Status httpconnection.cpp:120: Response timeout: 500 14-12-2014 16:25:43.292 UTC Status main.cpp:1334: Creating connection to Chronos 192.168.124.51:7253<http://192.168.124.51:7253> using 127.0.0.1:9888<http://127.0.0.1:9888> as the callback URI 14-12-2014 16:25:43.292 UTC Status httpconnection.cpp:149: HttpConnection for server 192.168.124.51:7253<http://192.168.124.51:7253> 14-12-2014 16:25:43.292 UTC Status httpconnection.cpp:150: Response timeout: 500 14-12-2014 16:25:43.292 UTC Status main.cpp:1404: Using memcached compatible store with ASCII protocol 14-12-2014 16:25:43.292 UTC Status memcachedstore.cpp:784: Reloading memcached configuration from /etc/clearwater/cluster_settings file 14-12-2014 16:25:43.292 UTC Status memcachedstore.cpp:801: servers= 192.168.124.51:11211<http://192.168.124.51:11211> 14-12-2014 16:25:43.292 UTC Status memcachedstore.cpp:140: Updating memcached store configuration 14-12-2014 16:25:43.292 UTC Status memcachedstore.cpp:161: Finished preparing new view, so flag that workers should switch to it 14-12-2014 16:25:43.292 UTC Status main.cpp:1459: Initialise S-CSCF authentication module 14-12-2014 16:25:43.292 UTC Status pluginloader.cpp:63: Loading plug-ins from /usr/share/clearwater/sprout/plugins 14-12-2014 16:25:43.292 UTC Status pluginloader.cpp:82: Attempt to load plug-in /usr/share/clearwater/sprout/plugins/sprout_bgcf.so 14-12-2014 16:25:43.302 UTC Status bgcfservice.cpp:71: No BGCF configuration (file ./bgcf.json does not exist) 14-12-2014 16:25:43.302 UTC Status pluginloader.cpp:106: Loaded sproutlet bgcf using API version 1 14-12-2014 16:25:43.302 UTC Status pluginloader.cpp:82: Attempt to load plug-in /usr/share/clearwater/sprout/plugins/sprout_mmtel_as.so 14-12-2014 16:25:43.303 UTC Status mmtelasplugin.cpp:87: Creating connection to XDMS homer.cw.t:7888 14-12-2014 16:25:43.303 UTC Status httpconnection.cpp:119: HttpConnection for server homer.cw.t:7888 14-12-2014 16:25:43.303 UTC Status httpconnection.cpp:120: Response timeout: 500 14-12-2014 16:25:43.303 UTC Status pluginloader.cpp:106: Loaded sproutlet mmtel using API version 1 14-12-2014 16:25:43.303 UTC Status pluginloader.cpp:82: Attempt to load plug-in /usr/share/clearwater/sprout/plugins/sprout_scscf.so 14-12-2014 16:25:43.304 UTC Status pluginloader.cpp:106: Loaded sproutlet scscf using API version 1 14-12-2014 16:25:43.304 UTC Status pluginloader.cpp:82: Attempt to load plug-in /usr/share/clearwater/sprout/plugins/sprout_icscf.so 14-12-2014 16:25:43.305 UTC Status pluginloader.cpp:137: Finished loading plug-ins 14-12-2014 16:25:43.312 UTC Status httpstack.cpp:131: Configuring HTTP stack 14-12-2014 16:25:43.312 UTC Status httpstack.cpp:132: Bind address: 192.168.124.51 14-12-2014 16:25:43.312 UTC Status httpstack.cpp:133: Bind port: 9888 14-12-2014 16:25:43.312 UTC Status httpstack.cpp:134: Num threads: 1 _______________________________________________ Clearwater mailing list [email protected]<mailto:[email protected]> http://lists.projectclearwater.org/listinfo/clearwater _______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/listinfo/clearwater
