As a workaround I've patched haproxy to check if a signal is present in the
signal queue, rather than rely on the signal_state[sig].count to determine
whether to add it.
--- a/src/signal.c 2015-02-01 06:54:32.000000000 +0000
+++ b/src/signal.c 2015-03-18 11:30:19.683413000 +0000
@@ -35,6 +35,17 @@
* Signal number zero has a specific status, as it cannot be delivered by the
* system, any function may call it to perform asynchronous signal delivery.
*/
+
+static int is_signal_queued(int sig)
+{
+ int i = 0;
+ while (i++ < signal_queue_len) {
+ if (sig == signal_queue[i])
+ return 1;
+ }
+ return 0;
+}
+
void signal_handler(int sig)
{
if (sig < 0 || sig >= MAX_SIGNAL) {
@@ -44,7 +55,7 @@
return;
}
- if (!signal_state[sig].count) {
+ if (!is_signal_queued(sig)) {
/* signal was not queued yet */
if (signal_queue_len < MAX_SIGNAL)
signal_queue[signal_queue_len++] = sig;
From: Alan Fitton [mailto:[email protected]]
Sent: 17 March 2015 16:02
To: [email protected]
Subject: HAProxy signal queue not working correctly
Hello,
We are in the process of deploying HAProxy to replace our existing internal
load balancers, 41 installations in our test environment. Backends will be
added and removed from the configuration automatically (maybe a few times an
hour) and then the "reload" functionality used.
Every few days, I find that 2 to 4 have ended up in a state where the reload
function doesn't work. More specifically, the SIGTTOU is ignored by the
existing HAProxy process, so the new one is unable to bind to its port.
I've been looking at the way HAProxy does signal handling and inspecting the
process using gdb. I think I can see why the signal is ignored, but am unsure
how exactly it ends up in this state.
Basically the signal_queue isn't being updated with a reference to SIGTTOU,
because signal_state[SIGTTOU].count is > 0. I guess there's an assumption in
the code that if any given signal already has events counted up in
signal_state, then it must have updated signal_queue so they will get processed
soon. But from what I see below, this doesn't seem to be the case always, and
then all events of a particular signal can end up getting "lost". I think there
is some timing or logic issue here.
(22 = SIGTTOU)
/* Break on SIGTTOU. There are 805 events in the
Program received signal SIGTTOU, Stopped (tty output).
0x00002b369ab6a373 in __epoll_wait_nocancel () from /lib64/libc.so.6
(gdb) print signal_state[22]
$16 = {count = 805, handlers = {n = 0xe1efa80, p = 0xe1efa80}}
(gdb) print signal_queue_len
$17 = 0
(gdb) c
Continuing.
Program received signal SIGTTIN, Stopped (tty input).
0x00002b369aac5320 in sigaction () from /lib64/libc.so.6
(gdb) print signal_queue_len
$18 = 0
(gdb) print signal_state[22]
$19 = {count = 806, handlers = {n = 0xe1efa80, p = 0xe1efa80}} <-- signal has
been counted, but they never get processed
(gdb) c
Continuing.
This is on RHEL5. Reload functionality is the reason we chose haproxy so it's
really important to us that it works correctly :) Please let me know if any
more details would be useful.
Thanks and Best Regards,
The information contained in this email is strictly confidential and for the
use of the addressee only, unless otherwise indicated. If you are not the
intended recipient, please do not read, copy, use or disclose to others this
message or any attachment. Please also notify the sender by replying to this
email or by telephone (+44(020 7896 0011) and then delete the email and any
copies of it. Opinions, conclusion (etc) that do not relate to the official
business of this company shall be understood as neither given nor endorsed by
it. IG is a trading name of IG Markets Limited (a company registered in England
and Wales, company number 04008957) and IG Index Limited (a company registered
in England and Wales, company number 01190902). Registered address at Cannon
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited
(register number 195355) and IG Index Limited (register number 114059) are
authorised and regulated by the Financial Conduct Authority.