Hit enter too fast, with the patch now.
On Tue, Jun 11, 2019 at 09:06:46AM +0200, Willy Tarreau wrote:
> Hi again Pieter,
>
> On Tue, Jun 11, 2019 at 04:24:47AM +0200, Willy Tarreau wrote:
> > I'm
> > going to have a look at this this morning. I now see how to make things
> > worse to observe the changes, I suspect that forcing a high nbthread and
> > binding all of them to a single CPU should reveal the issue much better.
>
> So I cannot reproduce your cases but by cheating I could make a very
> slight difference : I have started 50 processes in parallel, all on
> CPU #0, and all having 64 threads. That's a total of 3200 threads on
> a single CPU. Doing this with the TLS health check regtest, I see that
> before the patches it tool 14.2 seconds and after it took 14.7. However
> by modifying the startup code with the attached patch, it goes down to
> 11.3 seconds.
>
> I'd like you to give it a try in your environment to confirm whether or
> not it does improve things. If so, I'll clean it up and merge it. I'm
> also interested in any reproducer you could have, given that the made up
> test case I did above doesn't even show anything alarming.
>
> Thank you!
> Willy
diff --git a/src/haproxy.c b/src/haproxy.c
index a8898b78d..ca7cb77d5 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -2556,6 +2556,10 @@ static void run_poll_loop()
}
}
+static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t init_cond = PTHREAD_COND_INITIALIZER;
+static int waiters = 0;
+
static void *run_thread_poll_loop(void *data)
{
struct per_thread_alloc_fct *ptaf;
@@ -2577,7 +2581,11 @@ static void *run_thread_poll_loop(void *data)
* after reallocating them locally. This will also ensure there is
* no race on file descriptors allocation.
*/
- thread_isolate();
+
+ pthread_mutex_lock(&init_mutex);
+ /* first one must set the number of waiters */
+ if (!waiters)
+ waiters = global.nbthread;
tv_update_date(-1,-1);
@@ -2608,14 +2616,20 @@ static void *run_thread_poll_loop(void *data)
* we want all threads to have already allocated their local fd tables
* before doing so.
*/
- thread_sync_release();
- thread_isolate();
- if (tid == 0)
+ waiters--;
+ /* the last one is responsible for starting the listeners */
+ if (waiters == 0)
protocol_enable_all();
- /* done initializing this thread, don't start before others are done */
- thread_sync_release();
+ pthread_cond_broadcast(&init_cond);
+ pthread_mutex_unlock(&init_mutex);
+
+ /* now wait for other threads to finish starting */
+ pthread_mutex_lock(&init_mutex);
+ while (waiters)
+ pthread_cond_wait(&init_cond, &init_mutex);
+ pthread_mutex_unlock(&init_mutex);
run_poll_loop();