Hello All,

SMT mode switch operation on a large CPU count system takes close to an
hour to complete. Initial debugging root caused the delay to the CPU
hotplug subsystem being blocked on numerous synchronize_rcu() calls.
Simply enabling system-wide RCU expediting reduced the switch time to
5-6 minutes. Since then, different approaches have been explored, of
which some had their own side effects and others didn't work as
expected.

Approaches explored:

1. Expedited individual CPU hotplug operations by wrapping
_cpu_up()/_cpu_down() with rcu_expedite_gp()/rcu_unexpedite_gp() [0].
Peter suggested expediting only when SMT switch is triggered via the
sysfs control interface, not for individual hotplug operations [1].

2. Replacing synchronize_rcu() calls in the CPU hotplug codepath with
their expedited variants. This is not viable because one
synchronize_rcu() is invoked inside cpus_write_lock(), which is shared
with other kernel subsystems [5].

3. Hoisting cpus_write_lock() to be taken once for the entire SMT switch
operation instead of per-CPU [3][4]. On large systems where the SMT
switch can still take 5-6 minutes, holding the lock for that duration
causes hung task splats and starves other subsystems depending on the
read lock.

4. Peter has also suggested using rcu_sync_{enter|exit}() which as is
doesn't help as is, but can be paired the approach 2 from above.

Current approach: expedite RCU grace periods around the SMT switch
operation in the sysfs control interface path, per Peter's suggestion
[1], with Aboorva's analysis confirming synchronize_rcu() as the
bottleneck [2].

[0] https://lore.kernel.org/all/[email protected]
[1] 
https://lore.kernel.org/all/[email protected]/
[2] 
https://lore.kernel.org/all/5f2ab8a44d685701fe36cdaa8042a1aef215d10d.ca...@linux.vnet.ibm.com
[3] 
https://lore.kernel.org/all/[email protected]/
[4] 
https://lore.kernel.org/all/[email protected]/
[5] https://lore.kernel.org/all/[email protected]/
[6] 
https://lore.kernel.org/all/[email protected]/

Vishal Chourasia (1):
  cpuhp: Expedite RCU when toggling system-wide SMT mode

 include/linux/rcupdate.h | 8 ++++++++
 kernel/cpu.c             | 4 ++++
 kernel/rcu/rcu.h         | 4 ----
 3 files changed, 12 insertions(+), 4 deletions(-)

-- 
2.54.0


Reply via email to