We are running HAProxy 1.9.6 and managed to get into a state where HAProxy was completely unresponsive. It was pegged at 100% like many of the other experiences here on the mailing list lately. But in addition it wouldn't respond to anything. The stats socket wasn't even responsive.

When I attached an strace, it sat there with no activity. When I attached GDB I got the following stack:

        (gdb) bt full
        #0  htx_get_head (htx=0x7fbeb666eba0) at include/common/htx.h:357
        No locals.
        #1  h2s_htx_make_trailers (h2s=h2s@entry=0x7fbeb625f9f0, htx=htx@entry=0x7fbeb666eba0) at src/mux_h2.c:4975                         list = {{n = {ptr = 0x0, len = 0}, v = {ptr = 0x0, len = 0}} <repeats 101 times>}
                        h2c = 0x7fbeb6372320
                        blk = <optimized out>
                        blk_end = 0x0
                        outbuf = {size = 140722044755807, area = 0x0, data = 140457080712096, head = 140457060939041}                         h1m = {state = H1_MSG_HDR_NAME, flags = 2056, curr_len = 140457077580664, body_len = 16384, next = 2, err_pos = 0, err_state = -1237668736}
                        type = <optimized out>
                        ret = 0
                        hdr = 0
                        idx = <optimized out>
                        start = <optimized out>
        #2  0x00007fbeb50f2ef5 in h2_snd_buf (cs=0x7fbeb63ea9a0, buf=0x7fbeb6127048, count=2, flags=<optimized out>) at src/mux_h2.c:5372
                        h2s = <optimized out>
                        orig_count = <optimized out>
                        total = 15302
                        ret = <optimized out>
                        htx = 0x7fbeb666eba0
                        blk = <optimized out>
                        btype = <optimized out>
                        idx = <optimized out>
        #3  0x00007fbeb5180be4 in si_cs_send (cs=0x7fbeb63ea9a0) at src/stream_interface.c:691
                        send_flag = <optimized out>
                        conn = 0x7fbeb6051a70
                        si = 0x7fbeb6127268
                        oc = 0x7fbeb6127040
                        ret = <optimized out>
                        did_send = 0
        #4  0x00007fbeb51817c8 in si_update_both (si_f=si_f@entry=0x7fbeb6127268, si_b=si_b@entry=0x7fbeb61272a8) at src/stream_interface.c:850
                        req = 0x7fbeb6126fe0
                        res = <optimized out>
                        cs = <optimized out>
        #5  0x00007fbeb50ea2e1 in process_stream (t=<optimized out>, context=0x7fbeb6126fd0, state=<optimized out>) at src/stream.c:2502
                        srv = <optimized out>
                        s = 0x7fbeb6126fd0
                        sess = <optimized out>
                        rqf_last = <optimized out>
                        rpf_last = 3255042562
                        rq_prod_last = <optimized out>
                        rq_cons_last = <optimized out>
                        rp_cons_last = 7
                        rp_prod_last = 7
                        req_ana_back = <optimized out>
                        req = 0x7fbeb6126fe0
                        res = 0x7fbeb6127040
                        si_f = 0x7fbeb6127268
                        si_b = 0x7fbeb61272a8
        #6  0x00007fbeb51b20a8 in process_runnable_tasks () at src/task.c:434
                        t = <optimized out>
                        state = <optimized out>
                        ctx = <optimized out>
                        process = <optimized out>
                        t = <optimized out>
                        max_processed = <optimized out>
        #7  0x00007fbeb512b6ff in run_poll_loop () at src/haproxy.c:2642
                        next = <optimized out>
                        exp = <optimized out>
        #8  run_thread_poll_loop (data=data@entry=0x7fbeb5d84620) at src/haproxy.c:2707
                        ptif = <optimized out>
                        ptdf = <optimized out>
                        start_lock = 0
        #9  0x00007fbeb507d2b5 in main (argc=<optimized out>, argv=0x7ffc677d73b8) at src/haproxy.c:3343
                        tids = 0x7fbeb5d84620
                        threads = 0x7fbeb5eb6d90
                        i = <optimized out>
                        old_sig = {__val = {68097, 0, 511101108338, 0, 140722044760335, 140457059422467, 140722044760392, 140454020513805, 124, 140457064304960, 390842023936, 140457064395072, 48, 140457035994976, 18446603351664791121, 140454020513794}}
        ---Type <return> to continue, or q <return> to quit---
                        blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
                        err = <optimized out>
                        retry = <optimized out>
                        limit = {rlim_cur = 131300, rlim_max = 131300}
                        errmsg = "\000@\000\000\000\000\000\000\002\366\210\263\276\177\000\000\300\364m\265\276\177\000\000`\227\274\263\276\177\000\000\030\000\000\000\000\000\000\000>\001\000\024\000\000\000\000p$o\265\276\177\000\000@>k\265\276\177\000\000\000\320$\265\276\177\000\000\274\276\177\000\000 t}g\374\177\000\000\000\000\000\000\000\000\000\000P\367m\265"
                        pidfd = <optimized out>

Our config is big and complex, and not something I want to post here (I may be able to provide directly if required). However I think the important bit is that we we have a frontend and backend which are used for load balancing gRPC traffic (thus h2). The backend servers are h2c (no SSL).

The service has been restarted, so it cannot be probed any more. However I did capture a core file before doing so.

-Patrick

Reply via email to