On 4/1/25 4:41 PM, Waiman Long wrote:
On 4/1/25 3:59 PM, Tejun Heo wrote:
Hello, Waiman.
On Mon, Mar 31, 2025 at 11:12:06PM -0400, Waiman Long wrote:
The problem is the RCU delay between the time a cgroup is killed and
is in a
dying state and when the partition is deactivated when
cpuset_css_offline()
is called. That delay can be rather lengthy depending on the current
workload.
If we don't have to do it too often, synchronize_rcu_expedited() may be
workable too. What do you think?
I don't think we ever call synchronize_rcu() in the cgroup code except
for rstat flush. In fact, we didn't use to have an easy way to know if
there were dying cpusets hanging around. Now we can probably use the
root cgroup's nr_dying_subsys[cpuset_cgrp_id] to know if we need to
use synchronize_rcu*() call to wait for it. However, I still need to
check if there is any racing window that will cause us to miss it.
Sorry, I don't think I can use synchronize_rcu_expedited() as the use
cases that I am seeing most often is the creation of isolated partitions
running latency sensitive applications like DPDK. Using
synchronize_rcu_expedited() will send IPIs to all the CPUs which may
break the required latency guarantee for those applications. Just using
synchronize_rcu(), however, will have unpredictable latency impacting
user experience.
Another alternative that I can think of is to scan the remote
partition list
for remote partition and sibling cpusets for local partition
whenever some
kind of conflicts are detected when enabling a partition. When a dying
cpuset partition is detected, deactivate it immediately to resolve the
conflict. Otherwise, the dying partition will still be deactivated at
cpuset_css_offline() time.
That will be a bit more complex and I think can still get the
problem solved
without adding a new method. What do you think? If you are OK with
that, I
will send out a new patch later this week.
If synchronize_rcu_expedited() won't do, let's go with the original
patch.
The operation does make general sense in that it's for a distinctive
step in
the destruction process although I'm a bit curious why it's called
before
DYING is set.
Because of the above, I still prefer either using the original patch or
scanning for dying cpuset partitions in case a conflict is detected.
Please let me know what you think about it.
Thanks,
Longman