On Sat, Dec 26, 2020 at 06:34:21PM +0800, Hillf Danton wrote: > On Wed, 23 Dec 2020 11:49:51 -0800 "Paul E. McKenney" wrote: > >On Sat, Dec 19, 2020 at 01:09:09AM +0800, Lai Jiangshan wrote: > >> From: Lai Jiangshan <la...@linux.alibaba.com> > >> > >> 06249738a41a ("workqueue: Manually break affinity on hotplug") > >> said that scheduler will not force break affinity for us. > >> > >> But workqueue highly depends on the old behavior. Many parts of the codes > >> relies on it, 06249738a41a ("workqueue: Manually break affinity on > >> hotplug") > >> is not enough to change it, and the commit has flaws in itself too. > >> > >> It doesn't handle for worker detachment. > >> It doesn't handle for worker attachement, mainly worker creation > >> which is handled by Valentin Schneider's patch [1]. > >> It doesn't handle for unbound workers which might be possible > >> per-cpu-kthread. > >> > >> We need to thoroughly update the way workqueue handles affinity > >> in cpu hot[un]plug, what is this patchset intends to do and > >> replace the Valentin Schneider's patch [1]. The equivalent patch > >> is patch 10. > >> > >> Patch 1 fixes a flaw reported by Hillf Danton <hdan...@sina.com>. > >> I have to include this fix because later patches depends on it. > >> > >> The patchset is based on tip/master rather than workqueue tree, > >> because the patchset is a complement for 06249738a41a ("workqueue: > >> Manually break affinity on hotplug") which is only in tip/master by now. > >> > >> And TJ acked to route the series through tip. > >> > >> Changed from V1: > >> Add TJ's acked-by for the whole patchset > >> > >> Add more words to the comments and the changelog, mainly derived > >> from discussion with Peter. > >> > >> Update the comments as TJ suggested. > >> > >> Update a line of code as Valentin suggested. > >> > >> Add Valentin's ack for patch 10 because "Seems alright to me." and > >> add Valentin's comments to the changelog which is integral. > >> > >> [1]: > >> https://lore.kernel.org/r/ff62e3ee994efb3620177bf7b19fab16f4866845.ca...@redhat.com > >> [V1 patcheset]: > >> https://lore.kernel.org/lkml/20201214155457.3430-1-jiangshan...@gmail.com/ > >> > >> Cc: Hillf Danton <hdan...@sina.com> > >> Cc: Valentin Schneider <valentin.schnei...@arm.com> > >> Cc: Qian Cai <c...@redhat.com> > >> Cc: Peter Zijlstra <pet...@infradead.org> > >> Cc: Vincent Donnefort <vincent.donnef...@arm.com> > >> Cc: Tejun Heo <t...@kernel.org> > > > >And rcutorture hits this, so thank you for the fix! > > Can you please specify a bit what you encountered in rcutorture > before this patchset? You know we cant have a correct estimation > of the fix diameter without your help.
It triggers the following in sched_cpu_dying() in kernel/sched/core.c, exactly the same as for Lai Jiangshan: BUG_ON(rq->nr_running != 1 || rq_has_pinned_tasks(rq)) Which is in fact the "this" in my earlier "rcutorture hits this". ;-) Thanx, Paul > >Tested-by: Paul E. McKenney <paul...@kernel.org> > > > >> Lai Jiangshan (10): > >> workqueue: restore unbound_workers' cpumask correctly > >> workqueue: use cpu_possible_mask instead of cpu_active_mask to break > >> affinity > >> workqueue: Manually break affinity on pool detachment > >> workqueue: don't set the worker's cpumask when kthread_bind_mask() > >> workqueue: introduce wq_online_cpumask > >> workqueue: use wq_online_cpumask in restore_unbound_workers_cpumask() > >> workqueue: Manually break affinity on hotplug for unbound pool > >> workqueue: reorganize workqueue_online_cpu() > >> workqueue: reorganize workqueue_offline_cpu() unbind_workers() > >> workqueue: Fix affinity of kworkers when attaching into pool > >> > >> kernel/workqueue.c | 214 ++++++++++++++++++++++++++++----------------- > >> 1 file changed, 132 insertions(+), 82 deletions(-) > >> > >> -- > >> 2.19.1.6.gb485710b