On 2017/5/29 08:22, Frederic Lecaille wrote: > > Hi Patrick, > > First thank you for this nice and helpful report. > > Would it be possible to have an output of this command the next time > you reproduce such an issue please? > > echo "show sess" | socat stdio <haproxy stats socket path>
Unfortunately this would not be possible. When the issue occurs, the haproxy process has stopped accepting connections on all sockets. If I were to run this command, it would be sent to the new process, not the one that won't shut down. > > I have only one question (see below). > > On 05/24/2017 10:40 AM, Willy Tarreau wrote: >> Hi Patrick, >> >> On Tue, May 23, 2017 at 01:49:42PM -0400, Patrick Hemmer wrote: >> (...) >>> haproxy 28856 root 1u IPv4 420797940 0t0 >>> TCP 10.0.33.145:35754->10.0.33.147:1029 (CLOSE_WAIT) >>> haproxy 28856 root 2u IPv4 420266351 0t0 >>> TCP 10.0.33.145:52898->10.0.33.147:1029 (CLOSE_WAIT) >>> haproxy 28856 root 3r REG 0,3 0 >>> 4026531956 net >>> haproxy 28856 root 4u IPv4 422150834 0t0 >>> TCP 10.0.33.145:38874->10.0.33.147:1029 (CLOSE_WAIT) >> >> These ones are very interesting. > > These traces also seem interesting to me. > > # strace -p 28856 > Process 28856 attached > epoll_wait(0, {}, 200, 319) = 0 > epoll_wait(0, {}, 200, 0) = 0 > epoll_wait(0, {}, 200, 362) = 0 > epoll_wait(0, {}, 200, 0) = 0 > epoll_wait(0, {}, 200, 114) = 0 > epoll_wait(0, {}, 200, 0) = 0 > epoll_wait(0, {}, 200, 203) = 0 > epoll_wait(0, {}, 200, 0) = 0 > epoll_wait(0, {}, 200, 331) = 0 > epoll_wait(0, {}, 200, 0) > > > Were such "epoll_wait(0, 0, 200, 0)" calls infinitively displayed? Yes > > > In fact I am wondering if it is normal to have so much epoll_wait(0, > {}, 200, 0) calls for a haproxy process which has shut down. > > I suspect they are in relation with peer tasks (obviously which has > expired). > > If this is the case, and with configurations with only peer tasks, > haproxy would definitively hang consuming a lot of CPU resources. HAProxy was not consuming high CPU. Note that in every other call to `epoll_wait`, the 4th value was >0. If every single timeout value were 0, then yes, it would spin consuming CPU. > > So, I had a look at the peer struct task 'expire' member handling > code, and I have just found a situation where pollers in relation with > peer tasks are often called with an expired timeout leading haproxy to > consume a lot of CPU resources. In fact this happens each time the > peer task has expired during a fraction of second. > > It is easy to reproduce this issue with a sort of peer simulator ;): > > strace -ttf socat TCP4-LISTEN:<peer port>,reuseaddr,fork SYSTEM:"echo > 200;sleep 10" > > This peer must be started *before* the other remote haproxy process > with only peers as backends. > > strace is here only to have an idea of the moment where the remote > haproxy peer has just connected. > > The sleep command is here to have enough time to block (ctrl + s) our > peer simulator process after the haproxy peer has just connected. > > So this peer accepts any remote peer sessions sending "200" status > messages (and that's all). > > A haproxy peer which connects to such a peer which does not reply to a > synchronization request would endlessly consume high CPU ressources > until you unblock (ctrl + q) the peer simulator process. > > *Unhappily, I do not see any relation between this bug and the > "CLOSE_WAIT peer state issue" which prevents haproxy from correctly > shutting down.* > > I have attached a patch to this mail which fixes this issue. Again, we're not seeing high CPU usage in this specific case. We have reported a completely different scenario where haproxy starts consuming CPU doing `epoll_wait(x,x,x,0)`, but this is not that. Every time this shutdown issue occurs, the process is not consuming CPU. However it is possible the 2 issues might have the same root cause. I will try out the patch and see what happens. Thanks -Patrick > > Regards, > > Fred. > > > > > >