On Tue, Apr 30, 2019 at 03:51:30AM -0700, Paul E. McKenney wrote: > > Then I'm not entirely sure how we can return 0 and not run on the > > expected CPU. If we look at __set_cpus_allowed_ptr(), the only paths out > > to 0 are: > > > > - if the mask didn't change > > - if we already run inside the new mask > > - if we migrated ourself with the stop-task > > - if we're not in fact running > > > > That last case should never trigger in your circumstances, since @p == > > current and current is obviously running. But for completeness, the > > wakeup of @p would do the task placement in that case. > > Are there some diagnostics I could add that would help track this down, > be it my bug or yours?
Maybe limited function trace combined with the scheduling tracepoints would give clue. Trouble is, I forever forget how to set that up properly :/ Maybe something along these lines: $ trace-cmd record -p function_graph -g sched_setaffinity -g migration_cpu_stop -e sched_migirate_task -e sched_switch -e sched_wakeup Also useful would be: echo 1 > /proc/sys/kernel/traceoff_on_warning which ensures the trace stops the moment we find fail.