Steve, Further to the below, we’ve also fixed a couple of bugs in this area recently, most notably https://github.com/Metaswitch/sprout/issues/1570 which caused problems on Sprout overload. This particular issue was fixed in release-111 (the Rapidash release) so if you haven’t already it would also be a good idea to try upgrading to at least that release.
Seb. From: Clearwater [mailto:[email protected]] On Behalf Of Sebastian Rex Sent: 20 January 2017 09:44 To: [email protected] Subject: Re: [Project Clearwater] Sprout regularly crashes Hi Steve, It looks like the Sprout health monitoring script (possibly erroneously) thinks that Sprout is in a bad state and is killing it. Could you send us the monit logs from a time covering a crash? That should allow us to check whether that’s the case. They’re at /var/log/monit.log Also, do you have debug logging turned on? If so, then the sprout logs (/var/log/sprout/sprout_x.log) would also be useful. Could you also give us some more context here? i.e. - How much load are you putting through this system? - How big is your deployment? (Both the number of sprout nodes and the size of each sprout node) (If you both have debug logging turned on and are putting load through the system, we would suggest turning debug logging off as it considerably worsens performance and makes this more likely to occur.) Thanks, Seb. From: Clearwater [mailto:[email protected]] On Behalf Of Steven Adams Sent: 06 January 2017 21:04 To: [email protected]<mailto:[email protected]> Subject: [Project Clearwater] Sprout regularly crashes Started using Tauros recently and are noticing that sprout crashes a lot with this error: Signal 6 caught Basic stack dump: /usr/share/clearwater/bin/sprout(_ZN6Logger9backtraceEPKc+0x6d)[0x51467d] /usr/share/clearwater/bin/sprout(_ZN3Log9backtraceEPKcz+0x10d)[0x5d488d] /usr/share/clearwater/bin/sprout(_Z14signal_handleri+0x2c)[0x63ce4c] /lib/x86_64-linux-gnu/libc.so.6(+0x36cb0)[0x7fe1660a7cb0] /lib/x86_64-linux-gnu/libpthread.so.0(sem_wait+0x2e)[0x7fe1673af66e] /usr/share/clearwater/bin/sprout(main+0xb6fa)[0x5138ca] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fe166092f45] /usr/share/clearwater/bin/sprout[0x51451c] Advanced stack dump (requires gdb): sh: 1: /usr/bin/gdb: not found gdb failed with return code 32512 Typically seems to happen after sprout has received an incoming 200 OK message. Sprout does restart itself so it's not the end of the world but it doesn't bode well for stability.
_______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org
