Hit enter too fast, with the patch now.

On Tue, Jun 11, 2019 at 09:06:46AM +0200, Willy Tarreau wrote:
> Hi again Pieter,
> 
> On Tue, Jun 11, 2019 at 04:24:47AM +0200, Willy Tarreau wrote:
> > I'm
> > going to have a look at this this morning. I now see how to make things
> > worse to observe the changes, I suspect that forcing a high nbthread and
> > binding all of them to a single CPU should reveal the issue much better.
> 
> So I cannot reproduce your cases but by cheating I could make a very
> slight difference : I have started 50 processes in parallel, all on
> CPU #0, and all having 64 threads. That's a total of 3200 threads on
> a single CPU. Doing this with the TLS health check regtest, I see that
> before the patches it tool 14.2 seconds and after it took 14.7. However
> by modifying the startup code with the attached patch, it goes down to
> 11.3 seconds.
> 
> I'd like you to give it a try in your environment to confirm whether or
> not it does improve things. If so, I'll clean it up and merge it. I'm
> also interested in any reproducer you could have, given that the made up
> test case I did above doesn't even show anything alarming.
> 
> Thank you!
> Willy
diff --git a/src/haproxy.c b/src/haproxy.c
index a8898b78d..ca7cb77d5 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2556,6 +2556,10 @@ static void run_poll_loop()
        }
 }
 
+static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t  init_cond  = PTHREAD_COND_INITIALIZER;
+static int waiters = 0;
+
 static void *run_thread_poll_loop(void *data)
 {
        struct per_thread_alloc_fct  *ptaf;
@@ -2577,7 +2581,11 @@ static void *run_thread_poll_loop(void *data)
         * after reallocating them locally. This will also ensure there is
         * no race on file descriptors allocation.
         */
-       thread_isolate();
+
+       pthread_mutex_lock(&init_mutex);
+       /* first one must set the number of waiters */
+       if (!waiters)
+               waiters = global.nbthread;
 
        tv_update_date(-1,-1);
 
@@ -2608,14 +2616,20 @@ static void *run_thread_poll_loop(void *data)
         * we want all threads to have already allocated their local fd tables
         * before doing so.
         */
-       thread_sync_release();
-       thread_isolate();
 
-       if (tid == 0)
+       waiters--;
+       /* the last one is responsible for starting the listeners */
+       if (waiters == 0)
                protocol_enable_all();
 
-       /* done initializing this thread, don't start before others are done */
-       thread_sync_release();
+       pthread_cond_broadcast(&init_cond);
+       pthread_mutex_unlock(&init_mutex);
+
+       /* now wait for other threads to finish starting */
+       pthread_mutex_lock(&init_mutex);
+       while (waiters)
+               pthread_cond_wait(&init_cond, &init_mutex);
+       pthread_mutex_unlock(&init_mutex);
 
        run_poll_loop();
 

Reply via email to