On Thu, Jul 02, 2020 at 08:57:57AM -0400, Joel Fernandes wrote: [...] > > > unconstrained pick, then rq->core_pick is set. The next time task > > > selection > > > logic runs when stopper needs to switch to idle, the current CPU is not in > > > the smt_mask. This causes the previous ->core_pick to be picked again > > > which > > > happens to be the unconstrained task! so the stopper keeps getting > > > selected > > > forever. > > > > > > That and there are a few more safe guards and checks around > > > checking/setting > > > rq->core_pick. To test it, I ran rcutorture and made it tag all torture > > > threads. Then ran it in hotplug mode (hotplugging every 200ms) and it hit > > > the > > > issue. Now it runs for an hour or so without issue. (Torture testing debug > > > changes: https://bit.ly/38htfqK ). > > > > > > Various fixes were tried causing varying degrees of crashes. Finally I > > > found > > > that it is easiest to just add current CPU to the smt_mask's copy always. > > > This is so that task selection logic always runs on the current CPU which > > > called schedule(). > > > > > > It looks good to me. > > Thank you for your review! Could I add your Reviewed-by tag to the patch?
Julien and Vineeth, here is by coresched tree updated with this patch for when you are sending the next series: git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (branch coresched) There are some trivial fixups to the debug patch, due to this commit. So pulling from the above branch may save you some time. thanks, - Joel

