On Mon, Jan 22, 2018 at 05:47:55PM +0100, Willy Tarreau wrote:
> > strace: Process 12166 attached
> > [pid 12166] set_robust_list(0x7ff9bc9aa9e0, 24 <unfinished ...>
> > [pid 12166] <... set_robust_list resumed> ) = 0
> > [pid 12166] gettimeofday({1516289044, 684014}, NULL) = 0
> > [pid 12166] mmap(NULL, 134217728, PROT_NONE,
> > MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 <unfinished ...>
> > [pid 12166] <... mmap resumed> ) = 0x7ff9ac000000
> > [pid 12166] munmap(0x7ff9b0000000, 67108864) = 0
> > [pid 12166] mprotect(0x7ff9ac000000, 135168, PROT_READ|PROT_WRITE
> > <unfinished ...>
> > [pid 12166] <... mprotect resumed> ) = 0
> > [pid 12166] mmap(NULL, 8003584, PROT_READ|PROT_WRITE,
> > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
> > [pid 12166] <... mmap resumed> ) = 0x7ff9baa65000
> > [pid 12166] close(16 <unfinished ...>
> > [pid 12166] <... close resumed> ) = 0
> > [pid 12166] fcntl(15, F_SETFL, O_RDONLY|O_NONBLOCK <unfinished ...>
> > [pid 12166] <... fcntl resumed> ) = 0
>
> Here it's getting obvious that it was a shared file descriptor :-(
So I have a suspect here :
- run_thread_poll_loop() runs after the threads are created
- first thing it does is to close the master-worker pipe FD :
(...)
if (global.mode & MODE_MWORKER)
mworker_pipe_register(mworker_pipe);
(...)
void mworker_pipe_register(int pipefd[2])
{
close(mworker_pipe[1]); /* close the write end of the master pipe in
the children */
fcntl(mworker_pipe[0], F_SETFL, O_NONBLOCK);
(...)
}
Looks familiar with the trace above ?
So I guess your config works in master-worker mode, am I right ?
Note that I'm bothered with the call to protocol_enable_all() as
well in this function since it will start the proxies multiple times
in a possibly unsafe mode. That may explain a lot of things suddenly!
I think the attached patch works around it, but I'd like your
confirmation before cleaning it up.
Thanks,
Willy
diff --git a/src/haproxy.c b/src/haproxy.c
index 20b18f8..66639fc 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2339,7 +2339,11 @@ void mworker_pipe_handler(int fd)
void mworker_pipe_register(int pipefd[2])
{
+ if (mworker_pipe[1] < 0)
+ return;
+
close(mworker_pipe[1]); /* close the write end of the master pipe in
the children */
+ mworker_pipe[1] = -1;
fcntl(mworker_pipe[0], F_SETFL, O_NONBLOCK);
fdtab[mworker_pipe[0]].owner = mworker_pipe;
@@ -2408,6 +2412,7 @@ static void *run_thread_poll_loop(void *data)
{
struct per_thread_init_fct *ptif;
struct per_thread_deinit_fct *ptdf;
+ static __maybe_unused HA_SPINLOCK_T start_lock;
tid = *((unsigned int *)data);
tid_bit = (1UL << tid);
@@ -2420,10 +2425,12 @@ static void *run_thread_poll_loop(void *data)
}
}
+ HA_SPIN_LOCK(LISTENER_LOCK, &start_lock);
if (global.mode & MODE_MWORKER)
mworker_pipe_register(mworker_pipe);
protocol_enable_all();
+ HA_SPIN_UNLOCK(LISTENER_LOCK, &start_lock);
THREAD_SYNC_ENABLE();
run_poll_loop();