On Thu, Jun 14, 2018 at 01:51:20PM +0200, Janusz Dziemidowicz wrote:
> 2018-06-14 11:46 GMT+02:00 Willy Tarreau <[email protected]>:
> >> Will try.
> 
> I've tried the seconds path, together with the first one, no change at all.
> 
> However, I was able to catch it on my laptop finally. I still can't
> easily reproduce this, but at least that's something. Little
> background, my company makes online games, the one I am testing with
> is a web browser flash game. As it starts, it makes various API calls
> and loads game resources, graphics/music, etc. So I've disabled
> browser cache and tried closing browser tab with the game as it was
> loading. After a couple of tries I've achieved following state:
> tcp6    1190      0 SERVER_IP:443 MY_IP:54514 ESTABLISHED 538049/haproxy
> 
> This is with browser tab already closed. Browser (latest Chrome)
> probably keeps the connection alive, but haproxy should close it after
> a while. Well, that didn't happen, after good 30 minutes the
> connection is still ESTABLISHED. My timeouts are at the beginning of
> this thread, my understanding is that this connection should be killed
> after "timeout client" which is 60s.
> After that I've closed the browser completely. Connection moved to the
> CLOSE_WAIT state in question:
> tcp6    1191      0 SERVER_IP:443 MY_IP:54514 CLOSE_WAIT  538049/haproxy
> 
> haproxy logs (I have dontlognormal enabled): https://pastebin.com/sUsa6jNQ

Thank you! I've just found a bug which I suspect could be related. By trying
to exploit it I managed to reproduce the problem once, and after the fix I
couldn't anymore. That's not enough to draw a conclusion but I suspect I'm
on a track.

I found that the case where some extra data are pending after a
chunked-encoded response is not properly responded to by the H2 encoder,
the data are properly deleted but are not reported as being part of what
was sent. This can cause the upper layer to believe that nothing was sent
and to continue to wait. When trying to send responses containing garbage
after the final chunk, I ended in the same situation you saw, with an H2
connection still present in "show fd" and the timeout not getting rid of
it, apparently because there's permanently a stream attached to it. I also
remember we faced a similar situation in the early 1.8 with extra data
after content-length not being properly trimmed. It could very well be
similar here. It's unclear to me why the stream timeout doesn't trigger
(probably that the stream is considered completed, which would be the root
cause of the problem), but these data definitely need to be reported as
deleted.

If you'd like to run a test, I'm attaching the patch.

Cheers,
Willy

Reply via email to