Willy,

Thanks for your analysis and reply. Greatly appreciated.

I adjusted the "mss" in our bind, attempting values between your
suggested 1380 and 1460 (got that from elsewhere). Unfortunately, the
problem persists.

Most of the failures we've seen so far, have not been from browsers,
but from Git and Mercurial (this is all for bitbucket.org). However,
I've noticed that even browsers fail:

Oct  5 16:05:30 bb10 haproxy[29642]: 108.235.116.212:51558
[05/Oct/2012:16:05:28.542] ssl servers-ssl/bb12 1746/0/0/-1/+2414 -1
+0 - - CH-- 398/263/9/0/0 0/0 "POST /redacted/sf/issue-attachment/196/
HTTP/1.1"

We are binding to 127.0.0.1, as we are sitting behind stud, an SSL/TLS
terminator. I realize 1.5-dev12 has SSL support, but this is quite
recent, so we're using the stud->haproxy setup still.

I'd be more than happy to provide as much information as I can on this issue.

Any other ideas, or indications of what might be wrong?


Jesper

On Thu, Oct 4, 2012 at 11:13 PM, Willy Tarreau <[email protected]> wrote:
> On Thu, Oct 04, 2012 at 04:46:00PM -0700, Jesper Noehr wrote:
>> Hi list,
>>
>> I'm debugging a very strange issue, where *something* is hanging up on
>> the client.
>>
>> We're seeing this in the log whenever it happens:
>>
>> Oct  4 13:46:27 bb10 haproxy[14480]: 97.89.249.238:28856
>> [04/Oct/2012:13:38:58.282] ssl servers-git/bb05 75/0/0/-1/+449030 -1
>> +0 - - CH-- 391/275/9/0/0 0/0 "POST /Envoc/brdnug.git/git-receive-pack
>> HTTP/1.1"
>> Oct  4 13:46:47 bb10 haproxy[14480]: 199.16.190.194:29765
>> [04/Oct/2012:13:46:29.173] ssl servers-hg/bb12 20/0/0/-1/+18450 -1 +0
>> - - CH-- 380/262/12/0/0 0/0 "POST /qlovi/qlovi2?cmd=unbundle HTTP/1.1"
>>
>> On the client, it shows "An established connection was aborted by the
>> software in your host machine".
>>
>> It happens sporadically, and at random times during the stream of
>> data. It always happens on POST requests. It also does not seem to
>> coincide with any of the timeouts, as it can take 5 seconds, or 5
>> minutes for it to happen. I should also point out, it seems to mainly
>> happen for people with high latency.
>>
>> Reading the docs, the "CH--" is said to indicate: The timeout client
>> stroke while waiting for client data during a POST request. This is
>> sometimes caused by too large TCP MSS values for PPPoE networks which
>> cannot transport full-sized packets. It can also happen when client
>> timeout is smaller than server timeout and the server takes too long
>> to respond.
>>
>> This seems to say that the *client* hangs up, not the server. Client
>> timeout is 5000 seconds, and the server timeout is 1 hour.
>
> Yes, it's the client which hangs up. If you don't receive the full
> client's request for 450 seconds on a POST, generally it is because
> a part of the final headers were sent in a large TCP segment containing
> data that was either blocked by some router due to too large a packet
> for the MTU, or blocked by some front stupid equipment such as an IDS
> which believes it has found a signature for a known attack in the packet.
>
> In any case, when haproxy is tired of waiting, it closes the connection,
> the client receives the reset and reports a connection aborted by the
> server (which is true).
>
> There is nothing wrong in your configuration BTW. I'm seeing that you
> have an accept-proxy option, so I guess that your clients are home-made
> clients, not browsers.
>
> If you want to see if it's an MTU issue or not, you can easily force
> the MSS on the "bind" line :
>
>     bind 127.0.0.1:8080 accept-proxy mss 1380
>
> Regards,
> Willy
>

Reply via email to