Re: High CPU Usage followed by segfault error

Soji Antony Sat, 20 Oct 2018 09:29:30 -0700

Hi

FYI, following is the backtrace for segfault which we are seeing in the
kern.log.


Oct 18 10:11:30  kernel: [841364.001036] haproxy[30696]: segfault at 8 ip
00005567eaf6aac2 sp 00007ffdd70447b0 error 6 in haproxy[5567eae75000+172000]

# apport-retrace -g _usr_sbin_haproxy.0.crash
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Reading symbols from /usr/sbin/haproxy...Reading symbols from
/usr/lib/debug/.build-id/56/c5ffb3112d35c68a487caa1f4b788953891ade.debug...done.
done.
[New LWP 30696]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p
/var/run/haproxy.pid -sf 30646'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005567eaf6aac2 in do_unbind_listener
(listener=listener@entry=0x5567ebd82a00,
do_close=do_close@entry=1) at src/listener.c:319
319 src/listener.c: No such file or directory.
(gdb) list
314 in src/listener.c
(gdb) backtrace
#0  0x00005567eaf6aac2 in do_unbind_listener
(listener=listener@entry=0x5567ebd82a00,
do_close=do_close@entry=1) at src/listener.c:319
#1  0x00005567eaf6b252 in unbind_listener
(listener=listener@entry=0x5567ebd82a00)
at src/listener.c:342
#2  0x00005567eaf6b308 in unbind_all_listeners (proto=0x5567eb1fbcc0
<proto_tcpv4>) at src/listener.c:365
#3  0x00005567eaf9bf0e in protocol_unbind_all () at src/protocol.c:76
#4  0x00005567eaf397d6 in deinit () at src/haproxy.c:2289
#5  0x00005567eaea1e1f in main (argc=<optimized out>, argv=<optimized out>)
at src/haproxy.c:3092

Haproxy 1.8.14 is installed from ubuntu repository

# cat vbernat-haproxy.list
deb     http://ppa.launchpad.net/vbernat/haproxy-1.8/ubuntu trusty main
deb-src http://ppa.launchpad.net/vbernat/haproxy-1.8/ubuntu trusty main

Thanks

On Wed, Oct 17, 2018 at 9:26 AM Soji Antony <sojins...@gmail.com> wrote:

> Hi Willy / Olivier,
>
> Thank you very much for the patch & detailed explanation. I will apply
> this patch on our servers.
>
> > Unfortunately, as is often the case with gdb, that's less than useful :/
> > If you have that available, you may install the haproxy-dbg package, but
> > I'm not convinced it will yield better results.
>
> PFA gdb.txt file which has 'thread apply all bt' & 'info threads' command
> output after installing haproxy-dbg package if that helps.
>
> > Can you share your config, obsucating any confidential informations, IP
> > addresses etc ?
>
> PFA haproxy.cfg file, I have added a comment [removed] wherever I have
> truncated lines.
>
> > You mentionned you where getting a segfault, do you know how to
> reproduce it ?
>
> Not sure how to reproduce it. I can see these segfault error messages in
> kernel logs on random servers.I can try enabling crash dump on one of the
> servers and share the details.
>
> Oct  9 16:16:35  kernel: [85669.521234] haproxy[59075]: segfault at
> 7fda1fb0fc60 ip 000055c7273b643b sp 00007fd8c2ffaab0 error 4 in
> haproxy[55c72734e000+172000]
> Oct 10 09:48:43  [148797.364018] haproxy[60048]: segfault at 8 ip
> 0000556ba5c7eac2 sp 00007ffc5ef9e730 error 6 in haproxy[556ba5b89000+172000]
> Oct 11 14:30:56  kernel: [252130.055746] haproxy[4538]: segfault at
> 7fe088e87350 ip 00005637ab43fea7 sp 00007fe0857e8c20 error 4 in
> haproxy[5637ab410000+172000]
> Oct 11 16:47:03 kernel: [260297.444482] haproxy[74455]: segfault at
> 7f07d0de7290 ip 00005574f96e1ea7 sp 00007f07ce9c6c20 error 4 in
> haproxy[5574f96b2000+172000]
> Oct 11 22:06:19 : [279453.364729] haproxy[103724]: segfault at
> 7f7e492535d0 ip 000055c8b4f1dea7 sp 00007f7e46d93c20 error 4 in
> haproxy[55c8b4eee000+172000]
> Oct 13 04:31:14 : [388948.155673] haproxy[92338]: segfault at 8 ip
> 00005583be079ac2 sp 00007ffc6cb34e60 error 6 in haproxy[5583bdf84000+172000]
> Oct 15 15:17:04  kernel: [600498.581053] haproxy[63374]: segfault at 8 ip
> 000055dd2e7d1ac2 sp 00007ffed747e1d0 error 6 in haproxy[55dd2e6dc000+172000]
>
> > You also mentionned reloads are frequent, can you tell if the CPU spike
> happens
> immediately after a reload ?
>
> It is very difficult to say as the reloads are quite often. Attaching the
> graph for your reference. I can see that last reload happened at 16.56.19 &
> CPU spike usage started spiking at 16.57.30. But may be due to high CPU
> usage the script which we use to send this reload count to graphite might
> have failed & might be not reflecting in graph.
>
> > By the way, any reason you're running with SCHED_RR ? It might make
> things
> worse during reloads by letting some threads spin on their own spinlocks
> without offering a chance to the same thread of the other process to
> complete
> its work.
>
> This was a cron added long back when ''Meltdown and Spectre' fix slowed
> down CPU. We were using hap 1.6 in single process mode at that time. Later
> on we upgraded  hap to 1.8.13 but the cron was still enabled on our servers
> [#* * * * * /usr/bin/chrt -a -p 99 $(cat /var/run/haproxy.pid)]. Intially
> we suspected this might have caused  CPU spikes and disabled it. Currently
> It is disabled on our servers. CPU spikes observed even after disabling it.
>
>
>
> On Tue, Oct 16, 2018 at 9:07 PM Olivier Houchard <ohouch...@haproxy.com>
> wrote:
>
>> On Tue, Oct 16, 2018 at 05:02:30PM +0200, Willy Tarreau wrote:
>> > On Tue, Oct 16, 2018 at 04:11:20PM +0200, Willy Tarreau wrote:
>> > > Could you please apply the attached patch ? I'm going to merge it
>> into 1.9
>> > > and we'll backport it to 1.8 later.
>> >
>> > And please add the attached one as well, which is specific to 1.8. I
>> > suspect that different versions of compiler could emit inappropriate
>> > code due to the threads_want_sync variable not being marked volatile.
>> >
>> > In your case the issue would manifest itself if you're having heavy
>> > server queueing ("maxconn" on the server lines) or if you're seeing
>> > a lot of "server up/down" events.
>> >
>> > Thanks,
>> > Willy
>>
>>
>> Nice catch !
>>
>> This one is a good candidate.
>>
>> Regards,
>>
>> Olivier
>>
>

Re: High CPU Usage followed by segfault error

Reply via email to