On Tue, May 9, 2017 at 9:47 PM, Willy Tarreau <w...@1wt.eu> wrote: > On Tue, May 09, 2017 at 02:54:45PM +0300, Jarno Huuskonen wrote: >> My firefox(52.1 on linux) was able to send 128k file, >> but 800k file results in connection reset. My chrome sent 16k file, but >> fails (ERR_CONNECTION_RESET) on 17k file (few times even the 17k file >> worked).
My particular request is 17575 octets in total (headers + body), and as Jarno observed, it sometimes works Firefox / Chromium, but most times Chromium trips over it. The "non-determinism" of the issue is triggered by the fact if the browser manages to push all its payload over the network before the response from HAProxy is received. (If the network latency would be zero, and the receive window on HAProxy size would be extremly small, this issue would happen every time.) Based on a few `tcpdump` captures the issue can be described in terms of TCP as this: * the client sends its headers, and starts sending the request body; * HAProxy receives the headers, determines it's a redirect; * HAProxy writes the response status line and headers, and closes the connection via a reset packet; * meanwhile the client while still writing to the socket its payload, the reset packet is received which puts the client into an error state (because it tries to write to a closed socket); * thus the client just errors out without reading the response; Sometimes the reset packet is received after the client has finished writing, thus the condition is not encountered any more. (On explicit request over private email, I can provide a small tcpdump capture displaying the behaviour described herein.) > Hmmm that sounds bad, it looks like we've broken something again. I don't think HAProxy was "broken" this time, as we are using HAProxy 1.6.11 since last November, and only recently (since two-three weeks ago) we started encountering this issue without having any major changes to either the HAProxy configuration or the web application where we encountered this issue. In fact I think the browsers "broke" something, as only with recent variants of Chromium and Firefox we encounter this. However, since there are far less few HAProxy deployments than browsers (say 50.000 to 1 in our deployment), I think the easiest component to fix is HAProxy, by making it "drain" the entire connection before closing it. (Although I have no idea how complicated is to do this.) > What status code are you facing there ? I remember we've had such > an issue in the past where the server timeout could expire during > a long upload because it was not being refreshed during the upload > (it would thus result in a 504). This is the point, there is no "error" situation, as HAProxy just issues the redirect status line and headers and then resets the connection. Thanks, Ciprian.