On Thu, Jul 02, 2020 at 08:57:57AM -0400, Joel Fernandes wrote:
[...] 
> > > unconstrained pick, then rq->core_pick is set. The next time task 
> > > selection
> > > logic runs when stopper needs to switch to idle, the current CPU is not in
> > > the smt_mask. This causes the previous ->core_pick to be picked again 
> > > which
> > > happens to be the unconstrained task! so the stopper keeps getting 
> > > selected
> > > forever.
> > > 
> > > That and there are a few more safe guards and checks around 
> > > checking/setting
> > > rq->core_pick. To test it, I ran rcutorture and made it tag all torture
> > > threads. Then ran it in hotplug mode (hotplugging every 200ms) and it hit 
> > > the
> > > issue. Now it runs for an hour or so without issue. (Torture testing debug
> > > changes: https://bit.ly/38htfqK ).
> > > 
> > > Various fixes were tried causing varying degrees of crashes.  Finally I 
> > > found
> > > that it is easiest to just add current CPU to the smt_mask's copy always.
> > > This is so that task selection logic always runs on the current CPU which
> > > called schedule().
> > 
> > 
> > It looks good to me. 
> 
> Thank you for your review! Could I add your Reviewed-by tag to the patch?

Julien and Vineeth, here is by coresched tree updated with this patch for
when you are sending the next series:
git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (branch coresched)

There are some trivial fixups to the debug patch, due to this commit. So
pulling from the above branch may save you some time.

thanks,

 - Joel


Reply via email to