------------------------------------------------------------------------
*From:* Willy Tarreau [mailto:w...@1wt.eu]
*Sent:* Monday, May 6, 2019, 08:42 EDT
*To:* Patrick Hemmer <hapr...@stormcloud9.net>
*Cc:* haproxy@formilux.org
*Subject:* HAProxy 1.9.6 unresponsive

On Sun, May 05, 2019 at 09:40:02AM +0200, Willy Tarreau wrote:
With this said, after studying the code a little bit more, I'm seeing a
potential case where if we'd have a trailers entry in the HTX buffer but
no end of message, we could loop forever there not consuming this block.
I have no idea if this is possible in an HTX message, I'll ask Christopher
tomorrow. In any case we need to address this one way or another, possibly
reporting an error instead if required. Thus I'm postponing 1.9.8 for
tomorrow.
So the case is indeed possible and at the moment all we can do is try to
minimize the probability to produce it :-(  The issue is caused by the
moment we've received the end of trailsers but not the end of the mesage.
 From the H2 protocol perspective if we've sent the END_STREAM flag, the
stream is closed, and a closed stream gets detached and cannot receive
new traffic, so at best we'll occasionally close too early and report
client failures at the upper layers while everything went OK. We cannot
send trailers without the END_STREAM flag since no frame may follow.
Abusing CONTINUATION is out of question here as this would require to
completely freeze the whole connection (including control frames) for
the time it takes to get this final EOM block. I thought about simply
reporting an error when we're in this situation between trailers and EOM
but it will mean that occasionally some chunked responses of sizes close
to 16N kB with trailers may err out, which is not acceptable either.

For 2.0 we approximately see what needs to be modified to address this
situation, but that will not be trivial and not backportable.

For 1.9 I'm still trying to figure what the "best" solution is. I may
finally end up marking the stream as closed as soon as we see the
trailers pushed down. I'm just unsure right now about all the possible
consequences and need to study the edge cases. Also I fear that this
will be something hard to unroll later, so I'm still studying.

Willy

Just in case it's useful, we had the issue recur today. However I gleaned a little more information from this recurrence. Provided below are several outputs from a gdb `bt full`. The important bit is that in the captures, the last frame which doesn't change between each capture is the `si_cs_send` function. The last stack capture provided has the shortest stack depth of all the captures, and is inside `h2_snd_buf`.

Otherwise it's still the behavior is the same as last time with `strace` showing absolutely nothing, so it's still looping.




#0  h1_headers_to_hdr_list (start=0x7f5a4ea6b5fb "grpco\243?", stop=0x7f5a4ea6b5ff "o\243?", hdr=hdr@entry=0x7ffdc58f6400, hdr_num=hdr_num@entry=101, h1m=h1m@entry=0x7ffdc58f63d0, slp=slp@entry=0x0) at src/h1.c:793
        ret = <optimized out>
        state = <optimized out>
        ptr = <optimized out>
        end = <optimized out>
        hdr_count = <optimized out>
        skip = 0
        sol = <optimized out>
        col = <optimized out>
        eol = <optimized out>
        sov = <optimized out>
        sl = <optimized out>
        skip_update = <optimized out>
        restarting = <optimized out>
        n = <optimized out>
        v = {ptr = 0x7f5a4eb51453 "LZ\177", len = 140025825685243}
#1  0x00007f5a4d862539 in h2s_htx_make_trailers (h2s=h2s@entry=0x7f5a4ecc7860, htx=htx@entry=0x7f5a4ea67630) at src/mux_h2.c:4996         list = {{n = {ptr = 0x0, len = 0}, v = {ptr = 0x0, len = 0}} <repeats 101 times>}
        h2c = 0x7f5a4ec56610
        blk = <optimized out>
        blk_end = 0x0
        outbuf = {size = 140025844274259, area = 0x7f5a4d996efb <BIO_write+107> "\205\300~\aHc\320H\001SXH\205\355t\026Lc\310E1\300D\211\351L\211⾃", data = 16472, head = 140025845781936}         h1m = {state = H1_MSG_HDR_NAME, flags = 2056, curr_len = 0, body_len = 0, next = 4, err_pos = 0, err_state = 1320431563}
        type = <optimized out>
        ret = <optimized out>
        hdr = 0
        idx = 5
        start = <optimized out>
#2  0x00007f5a4d866ef5 in h2_snd_buf (cs=0x7f5a4e9a8980, buf=0x7f5a4e777d78, count=4, flags=<optimized out>) at src/mux_h2.c:5372
        h2s = <optimized out>
        orig_count = <optimized out>
        total = 16291
        ret = <optimized out>
        htx = 0x7f5a4ea67630
        blk = <optimized out>
        btype = <optimized out>
        idx = <optimized out>
#3  0x00007f5a4d8f4be4 in si_cs_send (cs=cs@entry=0x7f5a4e9a8980) at src/stream_interface.c:691
        send_flag = <optimized out>
        conn = 0x7f5a4e86f4c0
        si = 0x7f5a4e777f98
        oc = 0x7f5a4e777d70
        ret = <optimized out>
        did_send = 0
#4  0x00007f5a4d8f6305 in si_cs_io_cb (t=<optimized out>, ctx=0x7f5a4e777f98, state=<optimized out>) at src/stream_interface.c:737
        si = 0x7f5a4e777f98
        cs = 0x7f5a4e9a8980
        ret = 0
#5  0x00007f5a4d925f02 in process_runnable_tasks () at src/task.c:437
        t = <optimized out>
        state = <optimized out>
        ctx = <optimized out>
        process = <optimized out>
        t = <optimized out>
        max_processed = <optimized out>
#6  0x00007f5a4d89f6ff in run_poll_loop () at src/haproxy.c:2642
        next = <optimized out>
        exp = <optimized out>
#7  run_thread_poll_loop (data=data@entry=0x7f5a4e62a9b0) at src/haproxy.c:2707
        ptif = <optimized out>
        ptdf = <optimized out>
        start_lock = 0
#8  0x00007f5a4d7f12b5 in main (argc=<optimized out>, argv=0x7ffdc58f7738) at src/haproxy.c:3343
        tids = 0x7f5a4e62a9b0
        threads = 0x7f5a4e62ae10
        i = <optimized out>
        old_sig = {__val = {68097, 0, 511101108338, 0, 140727917966479, 140025825677571, 140727917966536, 140024523784205, 124, 140025830560064, 390842023936, 140025830650176, 48, 140025802250080, 18446603345791584977, 140024523784194}}         blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
        err = <optimized out>
        retry = <optimized out>
        limit = {rlim_cur = 131300, rlim_max = 131300}
        errmsg = "\000@\000\000\000\000\000\000\002\066\000LZ\177\000\000\300\064\345MZ\177\000\000`\327\063LZ\177\000\000\030\000\000\000\000\000\000\000>\001\000\024\000\000\000\000pd\346MZ\177\000\000@~\342MZ\177\000\000\000\020\234MZ\177\000\000\274/\000LZ\177\000\000\240w\217\305\375\177\000\000\000\000\000\000\000\000\000\000P7\345M"
        pidfd = <optimized out>












#0  0x00007f5a4d86247e in memset (__len=3232, __ch=0, __dest=0x7ffdc58f6400) at /usr/include/bits/string3.h:84
No locals.
#1  h2s_htx_make_trailers (h2s=h2s@entry=0x7f5a4ecc7860, htx=htx@entry=0x7f5a4ea67630) at src/mux_h2.c:4972         list = {{n = {ptr = 0x0, len = 0}, v = {ptr = 0x0, len = 0}} <repeats 101 times>}
        h2c = 0x7f5a4ec56610
        blk = <optimized out>
        blk_end = <optimized out>
        outbuf = {size = 140025844274259, area = 0x7f5a4d996efb <BIO_write+107> "\205\300~\aHc\320H\001SXH\205\355t\026Lc\310E1\300D\211\351L\211⾃", data = 16472, head = 140025845781936}         h1m = {state = H1_MSG_HDR_NAME, flags = 2056, curr_len = 0, body_len = 0, next = 4, err_pos = 0, err_state = 1320431563}
        type = <optimized out>
        ret = 0
        hdr = 0
        idx = <optimized out>
        start = <optimized out>
#2  0x00007f5a4d866ef5 in h2_snd_buf (cs=0x7f5a4e9a8980, buf=0x7f5a4e777d78, count=4, flags=<optimized out>) at src/mux_h2.c:5372
        h2s = <optimized out>
        orig_count = <optimized out>
        total = 16291
        ret = <optimized out>
        htx = 0x7f5a4ea67630
        blk = <optimized out>
        btype = <optimized out>
        idx = <optimized out>
#3  0x00007f5a4d8f4be4 in si_cs_send (cs=cs@entry=0x7f5a4e9a8980) at src/stream_interface.c:691
        send_flag = <optimized out>
        conn = 0x7f5a4e86f4c0
        si = 0x7f5a4e777f98
        oc = 0x7f5a4e777d70
        ret = <optimized out>
        did_send = 0
#4  0x00007f5a4d8f6305 in si_cs_io_cb (t=<optimized out>, ctx=0x7f5a4e777f98, state=<optimized out>) at src/stream_interface.c:737
        si = 0x7f5a4e777f98
        cs = 0x7f5a4e9a8980
        ret = 0
#5  0x00007f5a4d925f02 in process_runnable_tasks () at src/task.c:437
        t = <optimized out>
        state = <optimized out>
        ctx = <optimized out>
        process = <optimized out>
        t = <optimized out>
        max_processed = <optimized out>
#6  0x00007f5a4d89f6ff in run_poll_loop () at src/haproxy.c:2642
        next = <optimized out>
        exp = <optimized out>
#7  run_thread_poll_loop (data=data@entry=0x7f5a4e62a9b0) at src/haproxy.c:2707
        ptif = <optimized out>
        ptdf = <optimized out>
        start_lock = 0
#8  0x00007f5a4d7f12b5 in main (argc=<optimized out>, argv=0x7ffdc58f7738) at src/haproxy.c:3343
        tids = 0x7f5a4e62a9b0
        threads = 0x7f5a4e62ae10
        i = <optimized out>
        old_sig = {__val = {68097, 0, 511101108338, 0, 140727917966479, 140025825677571, 140727917966536, 140024523784205, 124, 140025830560064, 390842023936, 140025830650176, 48, 140025802250080, 18446603345791584977, 140024523784194}}         blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
        err = <optimized out>
        retry = <optimized out>
        limit = {rlim_cur = 131300, rlim_max = 131300}
        errmsg = "\000@\000\000\000\000\000\000\002\066\000LZ\177\000\000\300\064\345MZ\177\000\000`\327\063LZ\177\000\000\030\000\000\000\000\000\000\000>\001\000\024\000\000\000\000pd\346MZ\177\000\000@~\342MZ\177\000\000\000\020\234MZ\177\000\000\274/\000LZ\177\000\000\240w\217\305\375\177\000\000\000\000\000\000\000\000\000\000P7\345M"
        pidfd = <optimized out>









#0  0x00007f5a4d8fe612 in h1_headers_to_hdr_list (start=0x7f5a4ea6b5fb "grpco\243?", stop=0x7f5a4ea6b5ff "o\243?", hdr=hdr@entry=0x7ffdc58f6400, hdr_num=hdr_num@entry=101, h1m=h1m@entry=0x7ffdc58f63d0, slp=slp@entry=0x0) at src/h1.c:643
        state = <optimized out>
        ptr = 0x7f5a4ea6b5fc "rpco\243?"
        end = 0x7f5a4ea6b5ff "o\243?"
        hdr_count = 0
        skip = 0
        sol = 0
        col = <optimized out>
        eol = <optimized out>
        sov = <optimized out>
        sl = {rq = {m = {ptr = 0x7f5a4eb42ecc "ll pet treats](https://www.chewy.com/b/treats-988). [Kaytee Natural Timothy Hay Small Animal Food](https://www.chewy.com/kaytee-natural-timothy-hay-small/dp/123134) delivers a beneficial, high-fiber b"..., len = 140025825693206}, u = {ptr = 0x7c <Address 0x7c out of bounds>, len = 140025844215504}, v = {ptr = <optimized out>, len = <optimized out>}, meth = <optimized out>}, st = {v = {ptr = 0x7f5a4eb42ecc "ll pet treats](https://www.chewy.com/b/treats-988). [Kaytee Natural Timothy Hay Small Animal Food](https://www.chewy.com/kaytee-natural-timothy-hay-small/dp/123134) delivers a beneficial, high-fiber b"..., len = 140025825693206}, c = {ptr = 0x7c <Address 0x7c out of bounds>, len = 140025844215504}, r = {ptr = <optimized out>, len = <optimized out>}, status = <optimized out>}}
        skip_update = 0
        restarting = 0
        n = <optimized out>
        v = {ptr = 0x7f5a4eb51453 "LZ\177", len = 140025825685243}
#1  0x00007f5a4d862539 in h2s_htx_make_trailers (h2s=h2s@entry=0x7f5a4ecc7860, htx=htx@entry=0x7f5a4ea67630) at src/mux_h2.c:4996         list = {{n = {ptr = 0x0, len = 0}, v = {ptr = 0x0, len = 0}} <repeats 101 times>}
        h2c = 0x7f5a4ec56610
        blk = <optimized out>
        blk_end = 0x0
        outbuf = {size = 140025844274259, area = 0x7f5a4d996efb <BIO_write+107> "\205\300~\aHc\320H\001SXH\205\355t\026Lc\310E1\300D\211\351L\211⾃", data = 16472, head = 140025845781936}         h1m = {state = H1_MSG_HDR_NAME, flags = 2056, curr_len = 0, body_len = 0, next = 0, err_pos = 0, err_state = 1320431563}
        type = <optimized out>
        ret = <optimized out>
        hdr = 0
        idx = 5
        start = <optimized out>
#2  0x00007f5a4d866ef5 in h2_snd_buf (cs=0x7f5a4e9a8980, buf=0x7f5a4e777d78, count=4, flags=<optimized out>) at src/mux_h2.c:5372
        h2s = <optimized out>
        orig_count = <optimized out>
        total = 16291
        ret = <optimized out>
        htx = 0x7f5a4ea67630
        blk = <optimized out>
        btype = <optimized out>
        idx = <optimized out>
#3  0x00007f5a4d8f4be4 in si_cs_send (cs=cs@entry=0x7f5a4e9a8980) at src/stream_interface.c:691
        send_flag = <optimized out>
        conn = 0x7f5a4e86f4c0
        si = 0x7f5a4e777f98
        oc = 0x7f5a4e777d70
        ret = <optimized out>
        did_send = 0
#4  0x00007f5a4d8f6305 in si_cs_io_cb (t=<optimized out>, ctx=0x7f5a4e777f98, state=<optimized out>) at src/stream_interface.c:737
        si = 0x7f5a4e777f98
        cs = 0x7f5a4e9a8980
        ret = 0
#5  0x00007f5a4d925f02 in process_runnable_tasks () at src/task.c:437
        t = <optimized out>
        state = <optimized out>
        ctx = <optimized out>
        process = <optimized out>
        t = <optimized out>
        max_processed = <optimized out>
#6  0x00007f5a4d89f6ff in run_poll_loop () at src/haproxy.c:2642
        next = <optimized out>
        exp = <optimized out>
#7  run_thread_poll_loop (data=data@entry=0x7f5a4e62a9b0) at src/haproxy.c:2707
        ptif = <optimized out>
        ptdf = <optimized out>
        start_lock = 0
#8  0x00007f5a4d7f12b5 in main (argc=<optimized out>, argv=0x7ffdc58f7738) at src/haproxy.c:3343
        tids = 0x7f5a4e62a9b0
        threads = 0x7f5a4e62ae10
        i = <optimized out>
        old_sig = {__val = {68097, 0, 511101108338, 0, 140727917966479, 140025825677571, 140727917966536, 140024523784205, 124, 140025830560064, 390842023936, 140025830650176, 48, 140025802250080, 18446603345791584977, 140024523784194}}         blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
        err = <optimized out>
        retry = <optimized out>
        limit = {rlim_cur = 131300, rlim_max = 131300}
        errmsg = "\000@\000\000\000\000\000\000\002\066\000LZ\177\000\000\300\064\345MZ\177\000\000`\327\063LZ\177\000\000\030\000\000\000\000\000\000\000>\001\000\024\000\000\000\000pd\346MZ\177\000\000@~\342MZ\177\000\000\000\020\234MZ\177\000\000\274/\000LZ\177\000\000\240w\217\305\375\177\000\000\000\000\000\000\000\000\000\000P7\345M"
        pidfd = <optimized out>












#0  0x00007f5a4d866ba8 in h2_snd_buf (cs=0x7f5a4e9a8980, buf=0x7f5a4e777d78, count=4, flags=<optimized out>) at src/mux_h2.c:5321
        h2s = <optimized out>
        orig_count = <optimized out>
        total = 16291
        ret = <optimized out>
        htx = 0x7f5a4ea67630
        blk = <optimized out>
        btype = <optimized out>
        idx = <optimized out>
#1  0x00007f5a4d8f4be4 in si_cs_send (cs=cs@entry=0x7f5a4e9a8980) at src/stream_interface.c:691
        send_flag = <optimized out>
        conn = 0x7f5a4e86f4c0
        si = 0x7f5a4e777f98
        oc = 0x7f5a4e777d70
        ret = <optimized out>
        did_send = 0
#2  0x00007f5a4d8f6305 in si_cs_io_cb (t=<optimized out>, ctx=0x7f5a4e777f98, state=<optimized out>) at src/stream_interface.c:737
        si = 0x7f5a4e777f98
        cs = 0x7f5a4e9a8980
        ret = 0
#3  0x00007f5a4d925f02 in process_runnable_tasks () at src/task.c:437
        t = <optimized out>
        state = <optimized out>
        ctx = <optimized out>
        process = <optimized out>
        t = <optimized out>
        max_processed = <optimized out>
#4  0x00007f5a4d89f6ff in run_poll_loop () at src/haproxy.c:2642
        next = <optimized out>
        exp = <optimized out>
#5  run_thread_poll_loop (data=data@entry=0x7f5a4e62a9b0) at src/haproxy.c:2707
        ptif = <optimized out>
        ptdf = <optimized out>
        start_lock = 0
#6  0x00007f5a4d7f12b5 in main (argc=<optimized out>, argv=0x7ffdc58f7738) at src/haproxy.c:3343
        tids = 0x7f5a4e62a9b0
        threads = 0x7f5a4e62ae10
        i = <optimized out>
        old_sig = {__val = {68097, 0, 511101108338, 0, 140727917966479, 140025825677571, 140727917966536, 140024523784205, 124, 140025830560064, 390842023936, 140025830650176, 48, 140025802250080, 18446603345791584977, 140024523784194}}         blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
        err = <optimized out>
        retry = <optimized out>
        limit = {rlim_cur = 131300, rlim_max = 131300}
        errmsg = "\000@\000\000\000\000\000\000\002\066\000LZ\177\000\000\300\064\345MZ\177\000\000`\327\063LZ\177\000\000\030\000\000\000\000\000\000\000>\001\000\024\000\000\000\000pd\346MZ\177\000\000@~\342MZ\177\000\000\000\020\234MZ\177\000\000\274/\000LZ\177\000\000\240w\217\305\375\177\000\000\000\000\000\000\000\000\000\000P7\345M"
        pidfd = <optimized out>

Reply via email to