Hi Patrick,

On Fri, May 03, 2019 at 04:33:07PM -0400, Patrick Hemmer wrote:
> We are running HAProxy 1.9.6 and managed to get into a state where HAProxy
> was completely unresponsive. It was pegged at 100% like many of the other
> experiences here on the mailing list lately. But in addition it wouldn't
> respond to anything. The stats socket wasn't even responsive.
> 
> When I attached an strace, it sat there with no activity. When I attached
> GDB I got the following stack:
(...)
> Our config is big and complex, and not something I want to post here (I may
> be able to provide directly if required). However I think the important bit
> is that we we have a frontend and backend which are used for load balancing
> gRPC traffic (thus h2). The backend servers are h2c (no SSL).
(...)

Function h2s_htx_make_trailers() is called in loops here, and I see no way
this function can return without consuming the block, marking an error or
indicating that it's blocked. Thus I suspect this one could be a consequence
of the bug fixed by commit 9a0f559 ("BUG/MEDIUM: h2: Make sure we're not
already in the send_list in h2_subscribe().") which was backported into
1.9.7. Do not rush an upgrade though, I'm going to issue 1.9.8 soon with
a few more fixes.

With this said, after studying the code a little bit more, I'm seeing a
potential case where if we'd have a trailers entry in the HTX buffer but
no end of message, we could loop forever there not consuming this block.
I have no idea if this is possible in an HTX message, I'll ask Christopher
tomorrow. In any case we need to address this one way or another, possibly
reporting an error instead if required. Thus I'm postponing 1.9.8 for
tomorrow.

> The service has been restarted, so it cannot be probed any more. However I
> did capture a core file before doing so.

That might actually be useful to study the sequence of HTX messages there.
I may ask you to dig a little bit into it.

Thanks!
Willy

Reply via email to