From: Steven Rostedt (VMware) <[email protected]> Joel Fernandes found that the synchronize_rcu_tasks() was taking a significant amount of time. He demonstrated it with the following test:
# cd /sys/kernel/tracing # while [ 1 ]; do x=1; done & # echo '__schedule_bug:traceon' > set_ftrace_filter # time echo '!__schedule_bug:traceon' > set_ftrace_filter; real 0m1.064s user 0m0.000s sys 0m0.004s Where it takes a little over a second to perform the synchronize, because there's a loop that waits 1 second at a time for tasks to get through their quiescent points when there's a task that must be waited for. After discussion we came up with a simple way to wait for holdouts but increase the time for each iteration of the loop but no more than a full second. I also noticed that there's a final HZ/10 wait at the end of the loop for no apparent reason (at least there's nothing documenting why there's a sleep), so I removed that as well. With the new patch we have: # time echo '!__schedule_bug:traceon' > set_ftrace_filter; real 0m0.124s user 0m0.000s sys 0m0.004s Which drops it down to 12% of what the original wait time was. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Joel Fernandes (Google) <[email protected]> Suggested-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> --- diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 68fa19a5e7bd..0f6091097abc 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -715,6 +715,7 @@ static int __noreturn rcu_tasks_kthread(void *arg) struct rcu_head *list; struct rcu_head *next; LIST_HEAD(rcu_tasks_holdouts); + int fract; /* Run on housekeeping CPUs by default. Sysadm can move if desired. */ housekeeping_affine(current, HK_FLAG_RCU); @@ -796,13 +797,25 @@ static int __noreturn rcu_tasks_kthread(void *arg) * holdouts. When the list is empty, we are done. */ lastreport = jiffies; - while (!list_empty(&rcu_tasks_holdouts)) { + + /* Start off with HZ/10 wait and slowly back off to 1 HZ wait*/ + fract = 10; + + for (;;) { bool firstreport; bool needreport; int rtst; struct task_struct *t1; - schedule_timeout_interruptible(HZ); + if (list_empty(&rcu_tasks_holdouts)) + break; + + /* Slowly back off waiting for holdouts */ + schedule_timeout_interruptible(HZ/fract); + + if (fract > 1) + fract--; + rtst = READ_ONCE(rcu_task_stall_timeout); needreport = rtst > 0 && time_after(jiffies, lastreport + rtst); @@ -848,7 +861,6 @@ static int __noreturn rcu_tasks_kthread(void *arg) list = next; cond_resched(); } - schedule_timeout_uninterruptible(HZ/10); } }

