Hi Folks,

This is a strange one and I haven't yet been able to duplicate. But I wanted to report the description of what did happen in case it was either a known issue or one that would seem likely based on the code.

The servers in question are running HAProxy 1.7.1 on FreeBSD-11.

I made a change to a small text file used for ACL comparisons that, to my understanding, is only read when HAPRoxy is started or reloaded. This file was pushed out to all servers at essentially the same time, but there was no corresponding reload command issued at that time, and no other changes were made in this time frame - in fact, no actions taken at all to these servers for about 10 hours prior.

This file is referenced like so in a "frontend" block:

acl aclname src -f /path/to/file.txt

Approximately 23 minutes after pushing out this change, all HAProxy servers stopped accepting packets. TCP connections were being made at least for some time, but this may have been due to the OS accept queue rather than HAProxy actually accepting packets. In any case, no requests were being handled or logged.

No errors were logged, nor were health checks being issued or logged once things ground to a halt. Although the haproxy service was running and therefore not generating any alerts or automated service restarts, our external monitoring picked up the situation.

Seeing as these servers are in production my first action was to issue a reload to all servers, which took care of the immediate problem, after which I had time to dig through the logs looking for information about exactly what had failed.

What I found in addition to the above, was that instead of a single HAProxy process, there were two listed in ps:

92539 www 1 103 0 187M 133M CPU6 6 109:28 100.00% haproxy 12993 www 1 29 0 147M 96636K kqread 12 5:57 16.06% haproxy

truss showed no activity on the first PID, and killing that process had no negative effect upon the second, running process.

So, to sum up, what I observed was

1) changed a file referenced by an ACL (actual change was moving the location of a line within the file, not adding or deleting the total number of lines) 2) HAProxy ceased operating about 23 minutes later, but was still listed in 'ps' 3) after issuing a reload, there remained an HAProxy process, presumably the one that had stopped accepting requests.

Is it possible that a modification to a file referenced by an ACL (without an associated reload) could cause this sort of issue?

Best,
-=Mark

Reply via email to