Hi Valters,

On Wed, Sep 13, 2023 at 05:34:22AM +0300, Valters Jansons wrote:
> Hello,
> 
> I was previously investigating a strange gRPC streaming issue, that
> appears to be a fairly straight forward issue with how open streams
> get half-closed.
(...)

thanks a lot for all your investigation. I agree that something looks
odd, especially below:

> However, if the request on frontend does not have END_STREAM, then the
> backend also stays open.

Till now this is expected since the client is expected to upload the
message's body, so a timeout might fire on the frontend side, but
nothing should close on the backend side.

> When the backend sends a response (H2 HEADERS
> frame) with END_STREAM, the H2 stream is updated to "half-closed
> (remote)" but it is never properly considered closed.

As annoying as it can be, this is also expected in order not to break
uploads. For example if you have an H2->H1 gateway after haproxy,
interrupting the upload before the end would require to break the
connection, which can be particularly expensive (i.e. need to close
and re-establish a new one for next request) and even be abused for
denial of services.

> The server
> thinks all has been processed, and can send a RST_STREAM afterwards,
> but the actual response is not delivered to the frontend side.

Here there is a problem. If the response was provided, it must be seen
by haproxy and delivered to the other side. And the RST_STREAM which
comes after the response should have no other effect but interrupting
the upload and closing the stream to the server.

> Instead, HTTP 502 is sent to the original requestor, and the session
> disconnect state is reported as SH--.

I *suspect* that what is happening is that during the body forwarding
of the request (even if no data is being uploaded), we're seeing a
closed stream on the backend side and refraining from going further.
There might be a problem around this area. The response-before-completion
is often tricky to handle correctly, that's what I usually call the
"redirect-on-post" because that's the same principle as a client
uploading data (e.g. post on a webmail) and the server redirecting to
the login page due to an expired session. I guess you're facing a
combination of timing and events that causes the abortion of all the
request/response processing.

> In my scenario, there is only a HEADERS frame, so
> `h2c_bck_handle_headers` can be modified as to `h2s_close` on
> `H2_SS_HREM` (starting out as `H2_SS_OPEN`) in addition to
> `H2_SS_HLOC`. This is a hacky solution however, and would not address
> a DATA frame having the same issue. Instead, the actual response
> should be properly processed when the backend remote closes its side
> (instead of waiting on frontend to close its side).
> 
> I will try to loop back around to this issue, with a patch. But that
> will most likely take time from my side both due to limited personal
> bandwidth and unfamiliarity with the H2 processing. Anyone willing to
> provide a quicker patch is appreciated!

No problem. As you rightfully said, we need to figure the root cause
and not to try to work around the problem with a hacky solution. There
definitely is an issue here so we'll have a look.

Thanks for all your details!
Willy

Reply via email to