Hi Baiyang,
On Tue, Nov 24, 2015 at 11:09:33PM +0800, baiyang wrote:
> Hi Willy
>
> I've found generally it is very peaceful with the first 24 hours after a
> reboot. And usually one reproducing in the next day. And more frequently (2-3
> reproducing) at the third day.
>
> May be it's a tip?
I think I understand the pattern that produces the problem based on
your "show sess" output, thanks so much for these. Every time the
pattern is the same :
- request channel was closed end-to-end
- response channel was closed on the server side
- a timeout had stroke
- data are still present in the response buffer
So I think what happens is the following :
1) client stops reading for whatever reason
2) client decides to stop and sends a shutdown() on the request channel
3) haproxy forwards this shutdown() to the server during the transfer.
4) the server acks the shutdown() and shuts down in turn
5) the shutdown is queued in the response channel after the pending data
waiting to be delivered to the client
6) the client never reads them (eg: firewall cuts any further traffic), then
timeout strikes
7) a bug in the timeout handler makes haproxy ignore this specific condition,
so we loop between 6 and 7.
I'll try to see if that makes sense, but it should produce exactly your
output. It can also explain why it's rare and why when it happens, we
find multiple affected sessions from the same client.
In the mean time I suspect that removing "option abortonclose" will change
the situation by preventing the client's shutdown from reaching the server,
and will result in the client timeout to be correctly handled.
I remember that Cyril reported me a few occurrences of something looking
similar but that didn't last long enough to be caught live nor analyzed
(probably this gets killed once one of the other timeouts strike and in
your case the difference is high enough to let you see the problam last
long).
Cyril, I would appreciate it if you could check whether your affected
config also uses abort-on-close.
Thanks,
Willy