cpuset: Enable runtime update of nohz_full and managed_irq CPUs

Paul E. McKenney Thu, 02 Jul 2026 16:08:00 -0700

On Thu, Jul 02, 2026 at 05:00:03PM +0200, Thomas Gleixner wrote:
> On Wed, Jul 01 2026 at 16:22, Frederic Weisbecker wrote:
> > Le Thu, Jun 25, 2026 at 01:27:54AM -0400, Waiman Long a écrit :
> >> That will require some adjustments to the nohz_full related hotplug
> >> functions. I have some ideas of what needs to be done. However, I haven't
> >> looked into RCU yet. I know RCU support changing the nocb mask for fully
> >> offline CPUs, I will need to find out if it possible to do that for
> >> partially offline CPUs.
> >
> > No because callbacks can still be enqueued at this stage. But we could
> > manage to make it work with CPUHP_AP_IDLE_DEAD.
> 
> Well, if you go down to CPUHP_AP_IDLE_DEAD then that's not any different
> from going down all the way because the latency spike of stomp_machine()
> for bringing it down is the same.
> 
> You are right that with the current code this is not possible, but it
> should be possible to avoid that alltogether.
> 
> The only critical path is when a CPU switches to offload mode. Switching
> to 'yes queue callbacks here' mode is not really interesting.
> 
> Let's look how RCU hot-unplug works:
> 
>   1) CPU is marked !active
> 
>   2) rcutree_offline_cpu() removes the CPU from the fully functional CPU
>      mask
>   
>   3) stomp_machine()
> 
>   4) rcutree_cpu_dying() just traces that the CPU is about to vanish
> 
>   5) Wait for the CPU to report DEAD
> 
>   6) rcutree_migrate_callbacks() mops up the leftover callbacks on the
>      dead CPU
> 
> So if the whole machinery changes to:
> 
>   1) CPU is marked !active
> 
>   2) rcutree_offline_cpu() removes the CPU from the fully functional CPU
>      mask _AND_ marks the CPU as "lightweight offloaded", which means:
> 
>         - no new callbacks can be queued on it anymore neither from the
>           CPU itself nor from truly offloaded CPUs
> 
>         - the CPU is still processing already queued callbacks and
>           participates in the GP magic
> 
>   3) Before CPUHP_AP_SCHED_WAIT_EMPTY add a new CPUHP_AP_RCU_SYNC state,
>      which does:
> 
>        - a full RCU synchronization to end all outstanding read side
>          critical sections
> 
>        - drain the now ready callbacks on this CPU
> 
>   4) Proceed to CPUHP_TEARDOWN_CPU, where the operation stops
> 
>   5) Do the magic cpuset changes for the CPU
> 
>   6) Bring CPU back up
> 
> At #4 the half unplugged CPU is not in NOHZ full mode and the tick keeps
> running so all GP processing work as before except that the CPU itself
> is not handling any callbacks because all queued ones are drained and no
> new ones can be queued. When it comes back up it turns into a fully
> offloaded one.
> 
> There are obviously a gazillion of details and cornercases to handle,
> but I don't see why this can't be made work in principle.


For this case, where it is necessary to adjust the set of nohz_full CPUs
while the real-time workload continues running, and thus presumably also
necessary to adjust the real-time workload's set of CPUs mid-stream,
wouldn't it work better to just leave all CPUs in RCU-callbacks-offloaded
state?  Then you can adjust the nohz_full state of arbitrary CPUs without
messing with RCU.

RCU might still have some TOCTOU issues with tick_nohz_full_cpu(),
along with interesting interactions between nohz_full adjustments for
online CPUs and non-RCU portions of the kernel, but this approach would
certainly reduce the number of oddball RCU-centric race conditions that
must be addressed.

Full disclosure:  Frederic was attacking the full-up problem of switching
the RCU callback-offloading state of *online* CPUs initially, but a
continuous stream of race-condition bugs inspired the current state,
which is to allow this state to change only for offline CPUs.  But maybe
Frederic knows a new trick or two?

                                                        Thanx, Paul

Re: [PATCH-next 00/23] cgroup/cpuset: Enable runtime update of nohz_full and managed_irq CPUs

Reply via email to