On Sun, Nov 29, 2020 at 11:31:41AM -0800, Linus Torvalds wrote: > On Sun, Nov 29, 2020 at 5:38 AM Thomas Gleixner <t...@linutronix.de> wrote: > > > > Yet two more places which invoke tracing from RCU disabled regions in the > > idle path. Similar to the entry path the low level idle functions have to > > be non-instrumentable. > > This really seems less than optimal. > > In particular, lookie here: > > > @@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void) > > > > trace_cpu_idle(1, smp_processor_id()); > > stop_critical_timings(); > > + > > + /* > > + * arch_cpu_idle() is supposed to enable IRQs, however > > + * we can't do that because of RCU and tracing. > > + * > > + * Trace IRQs enable here, then switch off RCU, and have > > + * arch_cpu_idle() use raw_local_irq_enable(). Note that > > + * rcu_idle_enter() relies on lockdep IRQ state, so switch > > that > > + * last -- this is very similar to the entry code. > > + */ > > + trace_hardirqs_on_prepare(); > > + lockdep_hardirqs_on_prepare(_THIS_IP_); > > rcu_idle_enter(); > > + lockdep_hardirqs_on(_THIS_IP_); > > + > > arch_cpu_idle(); > > + > > + /* > > + * OK, so IRQs are enabled here, but RCU needs them > > disabled to > > + * turn itself back on.. funny thing is that disabling IRQs > > + * will cause tracing, which needs RCU. Jump through hoops > > to > > + * make it 'work'. > > + */ > > + raw_local_irq_disable(); > > + lockdep_hardirqs_off(_THIS_IP_); > > rcu_idle_exit(); > > + lockdep_hardirqs_on(_THIS_IP_); > > + raw_local_irq_enable(); > > + > > start_critical_timings(); > > trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id()); > > } > > And look at what the code generation for the idle exit path is when > lockdep isn't even on.
Agreed. The idea was to flip all of arch_cpu_idle() to not enable interrupts. This is suboptimal for things like x86 where arch_cpu_idle() is basically STI;HLT, but x86 isn't likely to actually use this code path anyway, given all the various cpuidle drivers it has. Many of the other archs are now doing things like arm's: wfi();raw_local_irq_enable(). Doing that tree-wide interrupt-state flip was something I didn't want to do at this late a stage, the chanse of messing that up is just too high. After that I need to go look at flipping cpuidle, which is even more 'interesting'. cpuidle_enter() has the exact same semantics, and this is the code path that x86 actually uses, and here it's inconsitent at best.