Re: HAProxy won't shut down

Patrick Hemmer Mon, 29 May 2017 09:13:14 -0700

On 2017/5/29 08:22, Frederic Lecaille wrote:
>
> Hi Patrick,
>
> First thank you for this nice and helpful report.
>
> Would it be possible to have an output of this command the next time
> you reproduce such an issue please?
>
>     echo "show sess" | socat stdio <haproxy stats socket path>


Unfortunately this would not be possible. When the issue occurs, the
haproxy process has stopped accepting connections on all sockets. If I
were to run this command, it would be sent to the new process, not the
one that won't shut down.

>
> I have only one question (see below).
>
> On 05/24/2017 10:40 AM, Willy Tarreau wrote:
>> Hi Patrick,
>>
>> On Tue, May 23, 2017 at 01:49:42PM -0400, Patrick Hemmer wrote:
>> (...)
>>> haproxy 28856 root    1u     IPv4          420797940      0t0       
>>> TCP 10.0.33.145:35754->10.0.33.147:1029 (CLOSE_WAIT)
>>> haproxy 28856 root    2u     IPv4          420266351      0t0       
>>> TCP 10.0.33.145:52898->10.0.33.147:1029 (CLOSE_WAIT)
>>> haproxy 28856 root    3r      REG                0,3        0
>>> 4026531956 net
>>> haproxy 28856 root    4u     IPv4          422150834      0t0       
>>> TCP 10.0.33.145:38874->10.0.33.147:1029 (CLOSE_WAIT)
>>
>> These ones are very interesting.
>
> These traces also seem interesting to me.
>
> # strace -p 28856
> Process 28856 attached
> epoll_wait(0, {}, 200, 319)             = 0
> epoll_wait(0, {}, 200, 0)               = 0
> epoll_wait(0, {}, 200, 362)             = 0
> epoll_wait(0, {}, 200, 0)               = 0
> epoll_wait(0, {}, 200, 114)             = 0
> epoll_wait(0, {}, 200, 0)               = 0
> epoll_wait(0, {}, 200, 203)             = 0
> epoll_wait(0, {}, 200, 0)               = 0
> epoll_wait(0, {}, 200, 331)             = 0
> epoll_wait(0, {}, 200, 0)
>
>
> Were such "epoll_wait(0, 0, 200, 0)" calls infinitively displayed?
Yes

>
>
> In fact I am wondering if it is normal to have so much epoll_wait(0,
> {}, 200, 0) calls for a haproxy process which has shut down.
>
> I suspect they are in relation with peer tasks (obviously which has
> expired).
>
> If this is the case, and with configurations with only peer tasks,
> haproxy would definitively hang consuming a lot of CPU resources.
HAProxy was not consuming high CPU. Note that in every other call to
`epoll_wait`, the 4th value was >0. If every single timeout value were
0, then yes, it would spin consuming CPU.

>
> So, I had a look at the peer struct task 'expire' member handling
> code, and I have just found a situation where pollers in relation with
> peer tasks are often called with an expired timeout leading haproxy to
> consume a lot of CPU resources. In fact this happens each time the
> peer task has expired during a fraction of second.
>
> It is easy to reproduce this issue with a sort of peer simulator ;):
>
> strace -ttf socat TCP4-LISTEN:<peer port>,reuseaddr,fork SYSTEM:"echo
> 200;sleep 10"
>
> This peer must be started *before* the other remote haproxy process
> with only peers as backends.
>
> strace is here only to have an idea of the moment where the remote
> haproxy peer has just connected.
>
> The sleep command is here to have enough time to block (ctrl + s) our
> peer simulator process after the haproxy peer has just connected.
>
> So this peer accepts any remote peer sessions sending "200" status
> messages (and that's all).
>
> A haproxy peer which connects to such a peer which does not reply to a
> synchronization request would endlessly consume high CPU ressources
> until you unblock (ctrl + q) the peer simulator process.
>
> *Unhappily, I do not see any relation between this bug and the
> "CLOSE_WAIT peer state issue" which prevents haproxy from correctly
> shutting down.*
>
> I have attached a patch to this mail which fixes this issue.
Again, we're not seeing high CPU usage in this specific case. We have
reported a completely different scenario where haproxy starts consuming
CPU doing `epoll_wait(x,x,x,0)`, but this is not that. Every time this
shutdown issue occurs, the process is not consuming CPU.
However it is possible the 2 issues might have the same root cause. I
will try out the patch and see what happens.

Thanks

-Patrick
>
> Regards,
>
> Fred.
>
>
>
>
>
>

Re: HAProxy won't shut down

Reply via email to