On Tue, Apr 30, 2019 at 01:55:51PM +0200, Peter Zijlstra wrote: > On Tue, Apr 30, 2019 at 03:51:30AM -0700, Paul E. McKenney wrote: > > > Then I'm not entirely sure how we can return 0 and not run on the > > > expected CPU. If we look at __set_cpus_allowed_ptr(), the only paths out > > > to 0 are: > > > > > > - if the mask didn't change > > > - if we already run inside the new mask > > > - if we migrated ourself with the stop-task > > > - if we're not in fact running > > > > > > That last case should never trigger in your circumstances, since @p == > > > current and current is obviously running. But for completeness, the > > > wakeup of @p would do the task placement in that case. > > > > Are there some diagnostics I could add that would help track this down, > > be it my bug or yours? > > Maybe limited function trace combined with the scheduling tracepoints > would give clue. > > Trouble is, I forever forget how to set that up properly :/ Maybe > something along these lines: > > $ trace-cmd record -p function_graph -g sched_setaffinity -g > migration_cpu_stop -e > sched_migirate_task -e sched_switch -e sched_wakeup > > Also useful would be: > > echo 1 > /proc/sys/kernel/traceoff_on_warning > > which ensures the trace stops the moment we find fail.
OK, what I did was to apply the patch at the end of this email to -rcu branch dev, then run rcutorture as follows: nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 --configs "TRIVIAL" --bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop" This resulted in the console output that I placed here: http://www2.rdrop.com/~paulmck/submission/console.log.gz But I don't see calls to sched_setaffinity() or migration_cpu_stop(). Steve, is something else needed on the kernel command line in addition to the following? ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index da04b5073dc3..ceae80522d64 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -680,12 +680,18 @@ static struct rcu_torture_ops tasks_ops = { static void synchronize_rcu_trivial(void) { int cpu; + static int dont_trace; for_each_online_cpu(cpu) { - while (raw_smp_processor_id() != cpu) - rcutorture_sched_setaffinity(current->pid, - cpumask_of(cpu)); - WARN_ON_ONCE(raw_smp_processor_id() != cpu); + if (!READ_ONCE(dont_trace)) + tracing_on(); + rcutorture_sched_setaffinity(current->pid, cpumask_of(cpu)); + tracing_off(); + if (raw_smp_processor_id() != cpu) { + WRITE_ONCE(dont_trace, 1); + WARN_ON_ONCE(1); + ftrace_dump(DUMP_ALL); + } } } diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index caffee644932..edaf0ca22ff7 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3495,6 +3495,7 @@ void __init rcu_init(void) rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0); WARN_ON(!rcu_par_gp_wq); srcu_init(); + tracing_off(); } #include "tree_stall.h"

