Re: [3.0.5] Unexpected SD-- on (almost) successful requests

Luke Seelenbinder Thu, 26 Sep 2024 08:19:58 -0700

Hi Christopher,

Thanks for the response.


> Sorry, I don't understand, the response was successfully sent to the client 
> when this happens or not ? It is "just" an issue with the termination state 
> or there is also an issue with the response itself ?

It's also an issue with the response. The chain is:

Varnish (status: 503) -> HAProxy (status: 200; termination: SD--) -> HAProxy 
Upstream (status: 200, termination: ----)

> At first glance, there is not so much fix that can explain that. Maybe the 
> following one, not sure:

I had the same thought…nothing really made sense to me either.

I'll try with `-dZ` and report back!

Best,
Luke

—
Luke Seelenbinder
Stadia Maps | Founder & CEO
stadiamaps.com

> On Sep 26, 2024, at 16:28, Christopher Faulet <[email protected]> wrote:
> 
> Hi Luke,
> 
> Le 26/09/2024 à 12:28, Luke Seelenbinder a écrit :
>> On upgrading to 3.0.5, we began to see a lot of failed backend requests. 
>> They are successful status codes but fail with connection state `SD--`. On 
>> the upstream side, the request succeeds (the upstream is also HAProxy, its 
>> state is `----`).
>> The data appears to be fully transferred without error, but something goes 
>> wrong towards the end of the request. This happens on a rather small 
>> percentage of requests, but I'm struggling to determine how to isolate the 
>> problem further. Timing and bytes transferred on both sides match up. 
>> Varnish is in the loop for most of these requests (but not all), and it ends 
>> up returning an error response, so it's not a spurious log line where the 
>> client doesn't register an error. To make matters worse, the response status 
>> code from the backend is successful, so the requests can't be retried using 
>> L7.
> 
> Sorry, I don't understand, the response was successfully sent to the client 
> when this happens or not ? It is "just" an issue with the termination state 
> or there is also an issue with the response itself ?
> 
>> The only thing that was changed should be the upgrade between 3.0.4 and 
>> 3.0.5.
>> Our settings are pretty standard. TLS on both sides; a mix of H3, H2, and 
>> H1.1 for the frontend; exclusively client-cert TLS + H1.1 for the backend. 
>> Errors happen on all FE protocols.
>> Any tips on how to debug this further? Possibly relevant config below.
> 
> Well, if it is a issue with the termination state while the response is fully 
> sent to the client, it may be a server shutdown that is caught too early, 
> when it is received with the last bytes of data.
> 
> At first glance, there is not so much fix that can explain that. Maybe the 
> following one, not sure:
> 
> commit e2a93b649286b30245333eec5851acd3991fda47
> Author: Christopher Faulet <[email protected]>
> Date:   Mon Jul 29 17:48:16 2024 +0200
> 
>    BUG/MEDIUM: stconn: Report error on SC on send if a previous SE error was 
> set
> 
>    When a send on a connection is performed, if a SE error (or a pending 
> error)
>    was already reported earlier, we leave immediately. No send is performed.
>    However, we must be sure to report the error at the SC level if necessary.
>    Indeed, the SE error may have been reported during the zero-copy data
>    forwarding. So during receive on the opposite side. In that case, we may
>    have missed the opportunity to report it at the SC level.
> 
>    The patch must be backported as far as 2.8.
> 
>    (cherry picked from commit 5dc45445ff18207dbacebf1f777e1f1abcd5065d)
>    Signed-off-by: Christopher Faulet <[email protected]>
> 
> You may try do disable the zero-copy data forwarding with -dZ command line 
> option.
> 
> -- 
> Christopher Faulet
>

Re: [3.0.5] Unexpected SD-- on (almost) successful requests

Reply via email to