Unfortunately, it takes rather several hours of running to identify this
failure; building a new package and bisecting is going to be quite
difficult with that long of a turnaround.

I however was able to get a tcpdump of one of the affected sessions.

This session spans several requests to the same backend (Typhoeus pools
curl sockets and tries to use keep-alive whenever possible). The first few
requests succeed; at some point there's a ~9-second break in requests while
the process does other stuff and the server closes the TCP stream (our load
balancer is configured to close idle keep-alive sessions after 9 seconds).
Within a few milliseconds, another request is sent on the same socket; it
immediately gets back a RST because the server has already closed this
socket. That appears to cause this exception to be raised by libcurl.

Did anything change around handling keep-alive session expiry in HTTP/1.1
mode? Nothing jumps out at me in the git log (there's something in the
schannel backend, but I'm on Linux; there's also d5bb459ccf
<https://github.com/curl/curl/commit/d5bb459ccf1fc5980ae4b95c05b4ecf6454a7599>
which
claims to only affect CONNECT-only connections, and all of this is regular
GETs and POSTs)...

On Fri, Sep 11, 2020 at 10:34 PM Ray Satiro via curl-library <
curl-library@cool.haxx.se> wrote:

> On 9/11/2020 2:03 PM, James Brown via curl-library wrote:
>
> After upgrading a test cluster from 7.71.1 to 7.72.0, we're now seeing
> around 0.1% of POSTs from one (and only one) of our applications fail with
> "Failed sending data to the peer" (CURLE_SEND_ERROR) and no other error.
> Based on logs, the request actually succeeds, but libcurl is returning this
> error. This application is using the Ruby Typhoeus wrapper and is itself
> unchanged. The relevant connections are all HTTP/1.1 connections to hosts
> on the local network, and the POSTs are all very small (<1KB) with nothing
> interesting about them.
>
> I haven't had any luck tracking this down since it's such a low fraction
> of requests and is only affecting one of our several hundred applications,
> but it reproducibly happens with 7.72 and not with 7.71.1.
>
> Anyone have any suggestions for how to try to track down the regression? I
> looked at the diff between 7.71.1 and 7.72.0 and no lines containing the
> string "CURLE_SEND_ERROR" were touched, which is unfortunate.
>
>
> There are no similar reports and I looked through the commit history but
> nothing stood out. If you can reliably reproduce then try bisecting it
> https://github.com/curl/curl/wiki/how-to-git-bisect
>
>
> -------------------------------------------------------------------
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette:   https://curl.haxx.se/mail/etiquette.html



-- 
James Brown
Engineer
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Reply via email to