cpuset: Enable runtime update of nohz_full and managed_irq CPUs

Thomas Gleixner Fri, 03 Jul 2026 13:51:15 -0700

On Fri, Jul 03 2026 at 15:45, Frederic Weisbecker wrote:
> Le Thu, Jul 02, 2026 at 05:00:03PM +0200, Thomas Gleixner a écrit :
>> At #4 the half unplugged CPU is not in NOHZ full mode and the tick keeps
>> running so all GP processing work as before except that the CPU itself
>> is not handling any callbacks because all queued ones are drained and no
>> new ones can be queued. When it comes back up it turns into a fully
>> offloaded one.
>
> But interrupts can still fire and queue callbacks, right?


Sure, but because of

>>   2) rcutree_offline_cpu() removes the CPU from the fully functional CPU
>>      mask _AND_ marks the CPU as "lightweight offloaded", which means:
>> 
>>         - no new callbacks can be queued on it anymore neither from the
>>           CPU itself nor from truly offloaded CPUs
>> 
>>         - the CPU is still processing already queued callbacks and
>>           participates in the GP magic

the queuing sees "offloaded", so callbacks won't end up on the outgoing
CPU. No?

>> There are obviously a gazillion of details and cornercases to handle,
>> but I don't see why this can't be made work in principle.
>
> If we need to do something tricky anyway, how about this that would
> solve the initial problem of hotplug:stop_machine VS latency sensitive 
> workloads
> in general?

I'm all for that but there is way more than RCU and places which consult
cpu_online_mask.

Before you get to the point where you can remove stomp_machine() from
the CPU down machinery, you have to go through:

 - All architecture specific code in __cpu_disable()
 
 - All existing (~60) AP callbacks in that section (former DYING
   notification)

and validate that none of that has assumptions about stomp_machine()
protecting them magically.

Back then when I was sanitizing CPU hotplug I looked into that deeply
and looked away pretty fast not only because of RCU. If it would have
been only RCU I surely would have pestered Paul enough to get it
fixed. :)

Let me give you some major pain points from my notes in complexity
order from back then:

   - All topology masks

     It's not only cpu_online_mask. There is numa_mask and all sibling,
     core, die, llc, l2c and whatever fancy masks we have and most of
     them are accessed in hotpaths all over the place and many of them
     implicitely rely on the stomp_machine() serialization (due to
     preempt/interrupt disable), unless they use an explicit
     cpuhp_read_lock() section.

   - RCU

     Plus the SMPCFD part, which has ordering constraints vs. RCU

   - Interrupt migration

     Sounds trivial but with the nastiness of the x86 APIC (w/o
     interrupt remapping) this becomes a nightmare pretty fast.

   - Tick

     Never dived deeply into it, but looking at the on the fly patches
     that's a solv[able|ed] problem.

   - Perf

     There were some truly nasty things in various perf implementations,
     but those got sorted out (at least on x86) due to RT by now. Still
     needs to be looked at.

That's x86 only. I've never looked at any other architecture and their 
callbacks in the stomp_machine() section.

Just looking at your back then proposal:

    set_cpu_online(cpu, 0)
    synchronize_rcu()
    migrate things // call CPUHP_TEARDOWN_CPU -> CPUHP_AP_IDLE_DEAD

There is a hen and egg problem right there. synchronize_rcu() running on
the outgoing CPU requires a functional scheduler as synchronize_rcu()
can sleep on the completion. But you just pulled the rug under the
scheduler because you set the CPU offline. So how exactly is the wakeup,
which might be coming from a different CPU going to work?

I totally agree with the long term goal of removing stomp_machine() from
the hotplug machinery completely, but the various subsystems which
depend on it today need to be solved one by one upfront with that goal
in mind. Once we have them out of the way, removing stomp_machine()
becomes trivial. But starting with it to begin with is a guaranteed
recipe for disaster.

Thanks,

        tglx

Re: [PATCH-next 00/23] cgroup/cpuset: Enable runtime update of nohz_full and managed_irq CPUs

Reply via email to