Hi, We have been using haproxy for couple of years and find it very stable. However last week our primary haproxy hit 100% user CPU and then stopped responding to any requests. It led to completely down of our web sites. When that happened, we were using haproxy 1.4.10. Then we upgraded to 1.4.23 immediately, but two days later, the 100% user CPU occurred again. Then we upgraded to 1.5 dev 18, but today, the 100% CPU occurred on 1.5 dev 18.
When all these happened, the haproxy configuration hasn't changed for over half a year. So we think this is not triggered by configuration change, and suspected specific traffic caused the issue. Also we don't think it's hardware specific issue, because when we switch the web traffic to backup haproxy server, the hang occurred again on the backup haproxy server and third backup haproxy server only after couple of minutes running. So far the troubleshooting steps we've taken are: 1) Checked all linux log to find anything wrong with the linux system. But we didn't find anything, CPU, Memory, harddisk, port, etc., suspicious. 2) Tried to dump session information though 'echo "show sess all" | socat /var/run/haproxy.stat stdio' > /var/log/haproxy-session.log. However it returns a zero byte file. When haproxy ran normally, the same command usually generates a log file of over 150K in size. 3) Tried to trace what haproxy process is doing though "strace -c -p $(pid of haproxy)". However it returns nothing as well. 4) Used GDB to step though the haproxy process, and find the haproxy is loop though the following code endlessly. For detail, please see attached file GDB_haproxy.txt. 444 in ebtree/ebtree.h 327 in src/lb_chash.c 330 in src/lb_chash.c 340 in src/lb_chash.c 341 in src/lb_chash.c 44 in src/queue.c 46 in src/queue.c 53 in src/queue.c 61 in src/queue.c 349 in src/lb_chash.c 325 in src/lb_chash.c 326 in src/lb_chash.c 326 in src/lb_chash.c 551 in ebtree/ebtree.h 553 in ebtree/ebtree.h 558 in ebtree/ebtree.h 559 in ebtree/ebtree.h The make command we used to build haproxy 1.4.10, 1.4.23 and 1.5 dev 18 is "make TARGET=linux2628 CPU=native USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1". This issue looks like an haproxy bug. If anyone could take a look and provide some workaround or fix, your effort will be highly appreciated. Thanks, -Henry
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/haproxy...done. Attaching to program: /usr/sbin/haproxy, process 11673 Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libcrypt.so.1 Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libz.so.1 Reading symbols from /usr/lib64/libssl.so.10...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libssl.so.10 Reading symbols from /usr/lib64/libcrypto.so.10...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libcrypto.so.10 Reading symbols from /usr/lib64/libpcreposix.so.0...(no debugging symbols found)...done. Loaded symbols for /usr/lib64/libpcreposix.so.0 Reading symbols from /lib64/libpcre.so.0...(no debugging symbols found)...done. Loaded symbols for /lib64/libpcre.so.0 Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/libfreebl3.so...(no debugging symbols found)...done. Loaded symbols for /lib64/libfreebl3.so Reading symbols from /lib64/libgssapi_krb5.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libgssapi_krb5.so.2 Reading symbols from /lib64/libkrb5.so.3...(no debugging symbols found)...done. Loaded symbols for /lib64/libkrb5.so.3 Reading symbols from /lib64/libcom_err.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libcom_err.so.2 Reading symbols from /lib64/libk5crypto.so.3...(no debugging symbols found)...done. Loaded symbols for /lib64/libk5crypto.so.3 Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libkrb5support.so.0...(no debugging symbols found)...done. Loaded symbols for /lib64/libkrb5support.so.0 Reading symbols from /lib64/libkeyutils.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libkeyutils.so.1 Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libresolv.so.2 Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done. [Thread debugging using libthread_db enabled] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libselinux.so.1...(no debugging symbols found)...done. Loaded symbols for /lib64/libselinux.so.1 Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib64/libnss_files.so.2 eb_walk_down (p=0x1ed57a0, srvtoavoid=<value optimized out>) at ebtree/ebtree.h:444 444 ebtree/ebtree.h: No such file or directory. in ebtree/ebtree.h Missing separate debuginfos, use: debuginfo-install haproxy-1.4.19-1.el6.x86_64 (gdb) bt #0 eb_walk_down (p=0x1ed57a0, srvtoavoid=<value optimized out>) at ebtree/ebtree.h:444 #1 eb_next (p=0x1ed57a0, srvtoavoid=<value optimized out>) at ebtree/ebtree.h:561 #2 eb32_next (p=0x1ed57a0, srvtoavoid=<value optimized out>) at ebtree/eb32tree.h:68 #3 chash_get_next_server (p=0x1ed57a0, srvtoavoid=<value optimized out>) at src/lb_chash.c:326 #4 0x000000000043ffab in assign_server (s=0x5885c00) at src/backend.c:615 #5 0x0000000000440148 in assign_server_and_queue (s=0x5885c00) at src/backend.c:791 #6 0x0000000000440291 in srv_redispatch_connect (t=0x5885c00) at src/backend.c:1030 #7 0x0000000000456894 in sess_prepare_conn_req (t=0x5886310) at src/session.c:1180 #8 process_session (t=0x5886310) at src/session.c:2198 #9 0x000000000040dab0 in process_runnable_tasks (next=0x7fff13db2c6c) at src/task.c:238 #10 0x0000000000404dd0 in run_poll_loop () at src/haproxy.c:1210 #11 0x0000000000407073 in main (argc=<value optimized out>, argv=<value optimized out>) at src/haproxy.c:1541 (gdb) eb_next (p=0x1ed57a0, srvtoavoid=<value optimized out>) at ebtree/ebtree.h:561 561 in ebtree/ebtree.h (gdb) eb_walk_down (p=0x1ed57a0, srvtoavoid=<value optimized out>) at ebtree/ebtree.h:445 445 in ebtree/ebtree.h (gdb) 444 in ebtree/ebtree.h (gdb) chash_get_next_server (p=0x1ed57a0, srvtoavoid=<value optimized out>) at src/lb_chash.c:327 327 src/lb_chash.c: No such file or directory. in src/lb_chash.c (gdb) 330 in src/lb_chash.c (gdb) 340 in src/lb_chash.c (gdb) 341 in src/lb_chash.c (gdb) srv_dynamic_maxconn (s=0x1ef1180) at src/queue.c:44 44 src/queue.c: No such file or directory. in src/queue.c (gdb) 46 in src/queue.c (gdb) 53 in src/queue.c (gdb) 61 in src/queue.c (gdb) chash_get_next_server (p=0x1ed57a0, srvtoavoid=<value optimized out>) at src/lb_chash.c:349 349 src/lb_chash.c: No such file or directory. in src/lb_chash.c (gdb) 325 in src/lb_chash.c (gdb) 326 in src/lb_chash.c (gdb) eb32_next (p=0x1ed57a0, srvtoavoid=<value optimized out>) at src/lb_chash.c:326 326 in src/lb_chash.c (gdb) eb_next (p=0x1ed57a0, srvtoavoid=<value optimized out>) at ebtree/ebtree.h:551 551 ebtree/ebtree.h: No such file or directory. in ebtree/ebtree.h (gdb) 553 in ebtree/ebtree.h (gdb) 558 in ebtree/ebtree.h (gdb) 559 in ebtree/ebtree.h (gdb) eb_walk_down (p=0x1ed57a0, srvtoavoid=<value optimized out>) at ebtree/ebtree.h:444 444 in ebtree/ebtree.h (gdb) chash_get_next_server (p=0x1ed57a0, srvtoavoid=<value optimized out>) at src/lb_chash.c:327 327 src/lb_chash.c: No such file or directory. in src/lb_chash.c (gdb) 330 in src/lb_chash.c (gdb) 340 in src/lb_chash.c (gdb) 341 in src/lb_chash.c (gdb) srv_dynamic_maxconn (s=0x1ee0260) at src/queue.c:44 44 src/queue.c: No such file or directory. in src/queue.c (gdb) 46 in src/queue.c (gdb) 53 in src/queue.c (gdb) 61 in src/queue.c (gdb) chash_get_next_server (p=0x1ed57a0, srvtoavoid=<value optimized out>) at src/lb_chash.c:349 349 src/lb_chash.c: No such file or directory. in src/lb_chash.c (gdb) 325 in src/lb_chash.c (gdb) (gdb) 326 in src/lb_chash.c (gdb) eb32_next (p=0x1ed57a0, srvtoavoid=<value optimized out>) at src/lb_chash.c:326 326 in src/lb_chash.c (gdb) eb_next (p=0x1ed57a0, srvtoavoid=<value optimized out>) at ebtree/ebtree.h:551 551 ebtree/ebtree.h: No such file or directory. in ebtree/ebtree.h (gdb) 553 in ebtree/ebtree.h (gdb) 555 in ebtree/ebtree.h (gdb) 553 in ebtree/ebtree.h (gdb) 558 in ebtree/ebtree.h (gdb) 559 in ebtree/ebtree.h (gdb) eb_walk_down (p=0x1ed57a0, srvtoavoid=<value optimized out>) at ebtree/ebtree.h:444 444 in ebtree/ebtree.h (gdb)

