cpuset: Enable runtime update of nohz_full and managed_irq CPUs

Frederic Weisbecker Fri, 03 Jul 2026 06:19:41 -0700

Le Wed, Jul 01, 2026 at 02:56:34PM -0400, Waiman Long a écrit :
> On 7/1/26 10:22 AM, Frederic Weisbecker wrote:
> > Le Thu, Jun 25, 2026 at 01:27:54AM -0400, Waiman Long a écrit :
> > > On 6/24/26 2:34 AM, Jing Wu wrote:
> > > >     3. Are there specific patches in your series where you would welcome
> > > >        our contribution directly?
> > > I have broken down the shutdown callback into separate portions as 
> > > suggested
> > > by Thomas. The other major change that I am working on is to try to 
> > > shutdown
> > > to only CPUHP_AP_OFFLINE state instead of all the way down to 
> > > CPUHP_OFFLINE.
> > What was the reason for that already? Can we perhaps ask the user to offline
> > the target CPUs before toggling isolation on them?
> The major problem about fully offlining the CPU is the CPU hotplug stop
> machine mechanism which put all the CPUs except the CPU to be offlined in a
> waiting loop within the IPI handler when the offline CPU is transitioning
> from CPUHP_TEARDOWN_CPU to  CPUHP_AP_IDLE_DEAD. If there is another active
> isolated partition running DPDK, for instance, it will break the low latency
> guarantee for a short duration.


Looks like a long standing problem that does not only concern nohz_full
but also RT in general.

I made a proposal a while ago to solve this:

https://lore.kernel.org/lkml/[email protected]/

To summarize, we could remove that stop machine thing and have this on the
outgoing CPU at CPUHP_TEARDOWN_CPU:

    set_cpu_online(cpu, 0)
    synchronize_rcu()
    migrate things // call CPUHP_TEARDOWN_CPU -> CPUHP_AP_IDLE_DEAD

And on other CPUs the usual should work:

    preempt_disable() // could now be replaced with rcu_read_lock()
    if (cpu_online(target))
        // do things
    preempt_enable()

There are a few dragons on the way in the update side but nothing unsolvable
as far as I checked. Of course we must check all those callbacks one by one.

Also on the read side we must be careful because:

    rcu_read_lock()
    A = cpu_online(target))
    B = cpu_online(target))
    rcu_read_unlock()

We can now have A && !B but I doubt many callsites do that.

> > > That will require some adjustments to the nohz_full related hotplug
> > > functions. I have some ideas of what needs to be done. However, I haven't
> > > looked into RCU yet. I know RCU support changing the nocb mask for fully
> > > offline CPUs, I will need to find out if it possible to do that for
> > > partially offline CPUs.
> > No because callbacks can still be enqueued at this stage. But we could
> > manage to make it work with CPUHP_AP_IDLE_DEAD.
> 
> If we can only go as high as CPUHP_AP_IDLE_DEAD, we may as well go down all
> the way to CPUHP_OFFLINE as stop machine should be done at
> CPUHP_AP_IDLE_DEAD. In that case, we may have to break RCU out from
> HK_TYPE_KERNEL_NOISE and add a cpuset control switch for the system
> administrators to decide if they are willing to suffer a brief latency spike
> for an existing isolated partition or keep the RCU housekeeping mask
> unchanged to avoid that when creating a new or destroying an old isolated
> partition.

Halfway nohz_full doesn't sound good...

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

Re: [PATCH-next 00/23] cgroup/cpuset: Enable runtime update of nohz_full and managed_irq CPUs

Reply via email to