Re: High CPU Usage followed by segfault error

Willy Tarreau Tue, 16 Oct 2018 07:12:02 -0700

Hi,

On Tue, Oct 16, 2018 at 09:28:46AM +0530, Soji Antony wrote:
> FYI, the initial version which we were using before upgrading to 1.8.14
> was 1.8.13.
> By mistake updated it as 1.8.3 in my first email.


No problem, thanks for the precision. After re-reading the code, I found a
bug, which is extremely difficult to trigger and which could induce the
situation you're running into. It requires a race between two "show fd"
on the CLI, where the second must start exactly when the first one stops.
I couldn't reproduce it since the window is too narrow but theorically it
exists, and the fact that you don't see it often could confirm this.

However since you're facing the same issue with 1.8.13 which uses the older
synchronization point, I still have some doubts about the root cause. But
we know the last one could be a bit tricky so we can't rule out a slightly
different issue giving overall the same visible effects.

Could you please apply the attached patch ? I'm going to merge it into 1.9
and we'll backport it to 1.8 later.

By the way, any reason you're running with SCHED_RR ? It might make things
worse during reloads by letting some threads spin on their own spinlocks
without offering a chance to the same thread of the other process to complete
its work.

Regards,
Willy

diff --git a/src/hathreads.c b/src/hathreads.c
index 9dba4356e..0a7c12f7a 100644
--- a/src/hathreads.c
+++ b/src/hathreads.c
@@ -221,12 +221,8 @@ void thread_isolate()
  */
 void thread_release()
 {
-       while (1) {
-               HA_ATOMIC_AND(&threads_want_rdv_mask, ~tid_bit);
-               if (!(threads_want_rdv_mask & all_threads_mask))
-                       break;
-               thread_harmless_till_end();
-       }
+       HA_ATOMIC_AND(&threads_want_rdv_mask, ~tid_bit);
+       thread_harmless_end();
 }
 
 __attribute__((constructor))

Re: High CPU Usage followed by segfault error

Reply via email to