Hi Valters, On Wed, Sep 13, 2023 at 05:34:22AM +0300, Valters Jansons wrote: > Hello, > > I was previously investigating a strange gRPC streaming issue, that > appears to be a fairly straight forward issue with how open streams > get half-closed. (...)
thanks a lot for all your investigation. I agree that something looks odd, especially below: > However, if the request on frontend does not have END_STREAM, then the > backend also stays open. Till now this is expected since the client is expected to upload the message's body, so a timeout might fire on the frontend side, but nothing should close on the backend side. > When the backend sends a response (H2 HEADERS > frame) with END_STREAM, the H2 stream is updated to "half-closed > (remote)" but it is never properly considered closed. As annoying as it can be, this is also expected in order not to break uploads. For example if you have an H2->H1 gateway after haproxy, interrupting the upload before the end would require to break the connection, which can be particularly expensive (i.e. need to close and re-establish a new one for next request) and even be abused for denial of services. > The server > thinks all has been processed, and can send a RST_STREAM afterwards, > but the actual response is not delivered to the frontend side. Here there is a problem. If the response was provided, it must be seen by haproxy and delivered to the other side. And the RST_STREAM which comes after the response should have no other effect but interrupting the upload and closing the stream to the server. > Instead, HTTP 502 is sent to the original requestor, and the session > disconnect state is reported as SH--. I *suspect* that what is happening is that during the body forwarding of the request (even if no data is being uploaded), we're seeing a closed stream on the backend side and refraining from going further. There might be a problem around this area. The response-before-completion is often tricky to handle correctly, that's what I usually call the "redirect-on-post" because that's the same principle as a client uploading data (e.g. post on a webmail) and the server redirecting to the login page due to an expired session. I guess you're facing a combination of timing and events that causes the abortion of all the request/response processing. > In my scenario, there is only a HEADERS frame, so > `h2c_bck_handle_headers` can be modified as to `h2s_close` on > `H2_SS_HREM` (starting out as `H2_SS_OPEN`) in addition to > `H2_SS_HLOC`. This is a hacky solution however, and would not address > a DATA frame having the same issue. Instead, the actual response > should be properly processed when the backend remote closes its side > (instead of waiting on frontend to close its side). > > I will try to loop back around to this issue, with a patch. But that > will most likely take time from my side both due to limited personal > bandwidth and unfamiliarity with the H2 processing. Anyone willing to > provide a quicker patch is appreciated! No problem. As you rightfully said, we need to figure the root cause and not to try to work around the problem with a hacky solution. There definitely is an issue here so we'll have a look. Thanks for all your details! Willy