I went through commits on master that aren't in 7.72.0 and came across
728f8d3bd
<https://github.com/curl/curl/commit/728f8d3bdc336e3fa838f45cad6c0133a6b604ae>,
which looks promising. It says it partially reverts a change from 7.65.2,
which is kind of weird because this problem does *not *occur for me in
7.71.1, which suggests that one of the other changes in 7.72.0 (maybe the
schannel stuff?) somehow had some spooky-action-at-a-distance and made this
much worse in 7.72.0. I feel like 728f8d3bd still has a race condition if
the FIN has been issued by the server but not yet processed by poll, but
that's neither here nor there.

Anyhow, I cherry-picked 728f8d3bd and it does seem to have fixed my issue
(or, at least, my issue hasn't occurred in the intervening ~30 minutes).

On Mon, Sep 14, 2020 at 2:38 PM James Brown <jbr...@easypost.com> wrote:

> Unfortunately, it takes rather several hours of running to identify this
> failure; building a new package and bisecting is going to be quite
> difficult with that long of a turnaround.
>
> I however was able to get a tcpdump of one of the affected sessions.
>
> This session spans several requests to the same backend (Typhoeus pools
> curl sockets and tries to use keep-alive whenever possible). The first few
> requests succeed; at some point there's a ~9-second break in requests while
> the process does other stuff and the server closes the TCP stream (our load
> balancer is configured to close idle keep-alive sessions after 9 seconds).
> Within a few milliseconds, another request is sent on the same socket; it
> immediately gets back a RST because the server has already closed this
> socket. That appears to cause this exception to be raised by libcurl.
>
> Did anything change around handling keep-alive session expiry in HTTP/1.1
> mode? Nothing jumps out at me in the git log (there's something in the
> schannel backend, but I'm on Linux; there's also d5bb459ccf
> <https://github.com/curl/curl/commit/d5bb459ccf1fc5980ae4b95c05b4ecf6454a7599>
>  which
> claims to only affect CONNECT-only connections, and all of this is regular
> GETs and POSTs)...
>
> On Fri, Sep 11, 2020 at 10:34 PM Ray Satiro via curl-library <
> curl-library@cool.haxx.se> wrote:
>
>> On 9/11/2020 2:03 PM, James Brown via curl-library wrote:
>>
>> After upgrading a test cluster from 7.71.1 to 7.72.0, we're now seeing
>> around 0.1% of POSTs from one (and only one) of our applications fail with
>> "Failed sending data to the peer" (CURLE_SEND_ERROR) and no other error.
>> Based on logs, the request actually succeeds, but libcurl is returning this
>> error. This application is using the Ruby Typhoeus wrapper and is itself
>> unchanged. The relevant connections are all HTTP/1.1 connections to hosts
>> on the local network, and the POSTs are all very small (<1KB) with nothing
>> interesting about them.
>>
>> I haven't had any luck tracking this down since it's such a low fraction
>> of requests and is only affecting one of our several hundred applications,
>> but it reproducibly happens with 7.72 and not with 7.71.1.
>>
>> Anyone have any suggestions for how to try to track down the regression?
>> I looked at the diff between 7.71.1 and 7.72.0 and no lines containing the
>> string "CURLE_SEND_ERROR" were touched, which is unfortunate.
>>
>>
>> There are no similar reports and I looked through the commit history but
>> nothing stood out. If you can reliably reproduce then try bisecting it
>> https://github.com/curl/curl/wiki/how-to-git-bisect
>>
>>
>> -------------------------------------------------------------------
>> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
>> Etiquette:   https://curl.haxx.se/mail/etiquette.html
>
>
>
> --
> James Brown
> Engineer
>


-- 
James Brown
Engineer
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Reply via email to