On Wed, Mar 27, 2019 at 02:24:23PM +0100, William Lallemand wrote:
> 
> On Wed, Mar 27, 2019 at 01:59:59PM +0300, Максим Куприянов wrote:
> > Hi, everybody! Got a core on 1.9.5.
> >
> 
> Hello,
> 
> How did it happened? Can you reproduce?
> Can you share the systemd log of the service so we can see the forking and
> leaving of the workers.
> 
> Thanks.
> 
>  
> > Core was generated by `/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p
> > /var/run/haproxy.pid'.
> > Program terminated with signal SIGABRT, Aborted.
> > #0  0x00007fe164fd0428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > (gdb) thread apply all bt
> > 
> > Thread 1 (Thread 0x7fe1668c7180 (LWP 169672)):
> > #0  0x00007fe164fd0428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > #1  0x00007fe164fd202a in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > #2  0x00007fe1650127ea in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #3  0x00007fe16501b37a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #4  0x00007fe16501f53c in free () from /lib/x86_64-linux-gnu/libc.so.6
> > #5  0x000055cee0588fc4 in mworker_catch_sigchld (sh=<optimized out>) at
> > src/haproxy.c:872
> > #6  0x000055cee0618f7b in __signal_process_queue () at src/signal.c:90
> > #7  0x000055cee058b3e7 in signal_process_queue () at
> > include/proto/signal.h:39
> > #8  run_poll_loop () at src/haproxy.c:2643
> > #9  run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2703
> > #10 0x000055cee04e7088 in mworker_loop () at src/haproxy.c:930
> > #11 main (argc=<optimized out>, argv=0x7ffff429aa28) at src/haproxy.c:3125
> > 
> 
> Looks like it tried to remove a child which is not in the process list 
> anymore,
> it's not supposed to happened. Can you share your configuration? 
> 
> Thanks.
> 

Can you try the attached patch?

Thanks

-- 
William Lallemand
>From d2d11e15fe8249b1c9709ccfbd35adada1904146 Mon Sep 17 00:00:00 2001
From: William Lallemand <[email protected]>
Date: Wed, 27 Mar 2019 14:19:10 +0100
Subject: [PATCH] BUG/MEDIUM: mworker: don't free the wrong child when not
 found

A bug occurs when the sigchld handler is called and a child which is
not in the process list just left, or with an empty process list.

The child variable won't be set and left as an uninitialized variable or
set to the wrong child entry, which can lead to a free of this
uninitialized variable or of the wrong child.

This can lead to a crash of the master during a stop or a reload.

This patch strengthens the case of the missing child by setting the
variable to NULL when not found.

This patch must be backported to 1.9.
---
 src/haproxy.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/haproxy.c b/src/haproxy.c
index 646f040db..89cacc7b8 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -827,7 +827,7 @@ static void mworker_catch_sigchld(struct sig_handler *sh)
 {
 	int exitpid = -1;
 	int status = 0;
-	struct mworker_proc *child, *it;
+	struct mworker_proc *child = NULL, *it;
 
 restart_wait:
 
@@ -843,15 +843,17 @@ restart_wait:
 			status = 255;
 
 		list_for_each_entry_safe(child, it, &proc_list, list) {
-			if (child->pid != exitpid)
+			if (child->pid != exitpid) {
+				child = NULL;
 				continue;
+			}
 
 			LIST_DEL(&child->list);
 			close(child->ipc_fd[0]);
 			break;
 		}
 
-		if (!children) {
+		if (!children || !child) {
 			ha_warning("Worker %d exited with code %d (%s)\n", exitpid, status, (status >= 128) ? strsignal(status - 128) : "Exit");
 		} else {
 			/* check if exited child was in the current children list */
-- 
2.19.2

Reply via email to