On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote: > On 8/28/20 4:51 PM, Peter Zijlstra wrote:
> > So where do things go side-ways? > During hotplug stress test, we have noticed that while a sibling is in > pick_next_task, another sibling can go offline or come online. What > we have observed is smt_mask get updated underneath us even if > we hold the lock. From reading the code, looks like we don't hold the > rq lock when the mask is updated. This extra logic was to take care of that. Sure, the mask is updated async, but _where_ is the actual problem with that? On Fri, Aug 28, 2020 at 06:23:55PM -0400, Joel Fernandes wrote: > Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6) > detail the individual hotplug changes squashed into this patch: > https://lore.kernel.org/lkml/20200815031908.1015049-9-j...@joelfernandes.org/ > https://lore.kernel.org/lkml/20200815031908.1015049-11-j...@joelfernandes.org/ That one looks fishy, the pick is core wide, making that pick_seq per rq just doesn't make sense. > https://lore.kernel.org/lkml/20200815031908.1015049-12-j...@joelfernandes.org/ This one reads like tinkering, there is no description of the actual problem just some code that makes a symptom go away. Sure, on hotplug the smt mask can change, but only for a CPU that isn't actually scheduling, so who cares. /me re-reads the hotplug code... ..ooOO is the problem that we clear the cpumasks on take_cpu_down() instead of play_dead() ?! That should be fixable. > https://lore.kernel.org/lkml/20200815031908.1015049-13-j...@joelfernandes.org/ This is the only one that makes some sense, it makes rq->core consistent over hotplug. > Agreed we can split the patches for the next series, however for final > upstream merge, I suggest we fix hotplug issues in this patch itself so that > we don't break bisectability. Meh, who sodding cares about hotplug :-). Also you can 'fix' such things by making sure you can't actually enable core-sched until after everything is in place.