On Mon, Jan 12, 2026 at 11:57:41PM +0530, Vishal Chourasia wrote:
> Hello Joel, Paul, Uladzislau,
> 
> On Mon, Jan 12, 2026 at 06:05:30PM +0100, Uladzislau Rezki wrote:
> > On Mon, Jan 12, 2026 at 08:48:42AM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 12, 2026 at 04:09:49PM +0000, Joel Fernandes wrote:
> > > > 
> > > > 
> > > > > On Jan 12, 2026, at 7:57 AM, Uladzislau Rezki <[email protected]> 
> > > > > wrote:
> > > > > 
> > > > >> 
> > > > > Sounds good to me. I agree it is better to bypass parameters.
> > > > 
> > > > Another way to make it in-kernel would be to make the RCU normal wake 
> > > > from GP optimization enabled for > 16 CPUs by default.
> > > > 
> > > > I was considering this, but I did not bring it up because I did not 
> > > > know that there are large systems that might benefit from it until now.
> > > 
> > > This would require increasing the scalability of this optimization,
> > > right?  Or am I thinking of the wrong optimization?  ;-)
> > > 
> > I tested this before. I noticed that after 64K of simultaneous
> > synchronize_rcu() calls the scalability is required. Everything
> > less was faster with a new approach.
> 
> It is worth noting that bulk CPU hotplug represents a different stress
> pattern than the "simultaneous call" scenario mentioned above.
> 
> In a large-scale hotplug event (like a SMT mode switch), we aren't
> necessarily seeing thousands of simultaneous synchronize_rcu() calls.
> Instead, because CPU hotplug operations are serialized, we see a
> "conveyor belt" of sequential calls. One synchronize_rcu() blocks, the
> hotplug state machine waits, it unblocks, and then the next call is
> triggered shortly after.
> 
> The bottleneck here isn't RCU scalability under concurrent load, but
> rather the accumulated latency of hundreds of sequential Grace Periods.
> 
> For example, on pSeries, onlining 350 out of 400 CPUs triggers exactly
> 350 calls at three different points in the hotplug state machine. Even
> though they happen one at a time, the sheer volume makes the total
> operation time prohibitive.
> 
> Following callstack was collected during SMT mode switch where 350 out
> of 400 CPUs were onlined,
> 
> @[
>     synchronize_rcu+12
>     cpuidle_pause_and_lock+120
>     pseries_cpuidle_cpu_online+88
>     cpuhp_invoke_callback+500
>     cpuhp_thread_fun+316
>     smpboot_thread_fn+512
>     kthread+308
>     start_kernel_thread+20
> ]: 350
> @[
>     synchronize_rcu+12
>     rcu_sync_enter+260
>     percpu_down_write+76
>     _cpu_up+140
>     cpu_up+440
>     cpu_subsys_online+128
>     device_online+176
>     online_store+220
>     dev_attr_store+52
>     sysfs_kf_write+120
>     kernfs_fop_write_iter+456
>     vfs_write+952
>     ksys_write+132
>     system_call_exception+292
>     system_call_vectored_common+348
> ]: 350
> @[
>     synchronize_rcu+12
>     rcu_sync_enter+260
>     percpu_down_write+76
>     try_online_node+64
>     cpu_up+120
>     cpu_subsys_online+128
>     device_online+176
>     online_store+220
>     dev_attr_store+52
>     sysfs_kf_write+120
>     kernfs_fop_write_iter+456
>     vfs_write+952
>     ksys_write+132
>     system_call_exception+292
>     system_call_vectored_common+348
> ]: 350
> 
> Following callstack was collected during SMT mode switch where 350 out
> of 400 CPUs where offlined,
> 
> @[
>     synchronize_rcu+12
>     rcu_sync_enter+260
>     percpu_down_write+76
>     _cpu_down+188
>     __cpu_down_maps_locked+44
>     work_for_cpu_fn+56
>     process_one_work+508
>     worker_thread+840
>     kthread+308
>     start_kernel_thread+20
> ]: 1
> @[
>     synchronize_rcu+12
>     sched_cpu_deactivate+244
>     cpuhp_invoke_callback+500
>     cpuhp_thread_fun+316
>     smpboot_thread_fn+512
>     kthread+308
>     start_kernel_thread+20
> ]: 350
> @[
>     synchronize_rcu+12
>     cpuidle_pause_and_lock+120
>     pseries_cpuidle_cpu_dead+88
>     cpuhp_invoke_callback+500
>     __cpuhp_invoke_callback_range+200
>     _cpu_down+412
>     __cpu_down_maps_locked+44
>     work_for_cpu_fn+56
>     process_one_work+508
>     worker_thread+840
>     kthread+308
>     start_kernel_thread+20
> ]: 350

I still suggest that you test on a big system.  There are other sources
of synchronize_rcu() calls than just CPU hotplug.  ;-)

                                                        Thanx, Paul

Reply via email to