On Tue, Apr 30, 2019 at 01:55:51PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 30, 2019 at 03:51:30AM -0700, Paul E. McKenney wrote:
> > > Then I'm not entirely sure how we can return 0 and not run on the
> > > expected CPU. If we look at __set_cpus_allowed_ptr(), the only paths out
> > > to 0 are:
> > > 
> > >  - if the mask didn't change
> > >  - if we already run inside the new mask
> > >  - if we migrated ourself with the stop-task
> > >  - if we're not in fact running
> > > 
> > > That last case should never trigger in your circumstances, since @p ==
> > > current and current is obviously running. But for completeness, the
> > > wakeup of @p would do the task placement in that case.
> > 
> > Are there some diagnostics I could add that would help track this down,
> > be it my bug or yours?
> 
> Maybe limited function trace combined with the scheduling tracepoints
> would give clue.
> 
> Trouble is, I forever forget how to set that up properly :/ Maybe
> something along these lines:
> 
> $ trace-cmd record -p function_graph -g sched_setaffinity -g 
> migration_cpu_stop -e
> sched_migirate_task -e sched_switch -e sched_wakeup
> 
> Also useful would be:
> 
> echo 1 > /proc/sys/kernel/traceoff_on_warning
> 
> which ensures the trace stops the moment we find fail.

OK, what I did was to apply the patch at the end of this email to -rcu
branch dev, then run rcutorture as follows:

nohup tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 2 
--configs "TRIVIAL" --bootargs 
"trace_event=sched:sched_switch,sched:sched_wakeup ftrace=function_graph 
ftrace_graph_filter=sched_setaffinity,migration_cpu_stop"

This resulted in the console output that I placed here:

http://www2.rdrop.com/~paulmck/submission/console.log.gz

But I don't see calls to sched_setaffinity() or migration_cpu_stop().
Steve, is something else needed on the kernel command line in addition to
the following?

ftrace=function_graph ftrace_graph_filter=sched_setaffinity,migration_cpu_stop

                                                        Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index da04b5073dc3..ceae80522d64 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -680,12 +680,18 @@ static struct rcu_torture_ops tasks_ops = {
 static void synchronize_rcu_trivial(void)
 {
        int cpu;
+       static int dont_trace;
 
        for_each_online_cpu(cpu) {
-               while (raw_smp_processor_id() != cpu)
-                       rcutorture_sched_setaffinity(current->pid,
-                                                    cpumask_of(cpu));
-               WARN_ON_ONCE(raw_smp_processor_id() != cpu);
+               if (!READ_ONCE(dont_trace))
+                       tracing_on();
+               rcutorture_sched_setaffinity(current->pid, cpumask_of(cpu));
+               tracing_off();
+               if (raw_smp_processor_id() != cpu) {
+                       WRITE_ONCE(dont_trace, 1);
+                       WARN_ON_ONCE(1);
+                       ftrace_dump(DUMP_ALL);
+               }
        }
 }
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index caffee644932..edaf0ca22ff7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3495,6 +3495,7 @@ void __init rcu_init(void)
        rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
        WARN_ON(!rcu_par_gp_wq);
        srcu_init();
+       tracing_off();
 }
 
 #include "tree_stall.h"

Reply via email to