On Tue, Jul 09, 2013 at 11:37:28AM +0200, Peter Zijlstra wrote: > On Mon, Jul 08, 2013 at 06:30:01PM -0700, Paul E. McKenney wrote: > > From: "Paul E. McKenney" <paul...@linux.vnet.ibm.com> > > > > This commit adds fields to the rcu_dyntick structure that are used to > > detect idle CPUs. These new fields differ from the existing ones in > > that the existing ones consider a CPU executing in user mode to be idle, > > where the new ones consider CPUs executing in user mode to be busy. > > The handling of these new fields is otherwise quite similar to that for > > the exiting fields. This commit also adds the initialization required > > for these fields. > > > > So, why is usermode execution treated differently, with RCU considering > > it a quiescent state equivalent to idle, while in contrast the new > > full-system idle state detection considers usermode execution to be > > non-idle? > > > > It turns out that although one of RCU's quiescent states is usermode > > execution, it is not a full-system idle state. This is because the > > purpose of the full-system idle state is not RCU, but rather determining > > when accurate timekeeping can safely be disabled. Whenever accurate > > timekeeping is required in a CONFIG_NO_HZ_FULL kernel, at least one > > CPU must keep the scheduling-clock tick going. If even one CPU is > > executing in user mode, accurate timekeeping is requires, particularly for > > architectures where gettimeofday() and friends do not enter the kernel. > > Only when all CPUs are really and truly idle can accurate timekeeping be > > disabled, allowing all CPUs to turn off the scheduling clock interrupt, > > thus greatly improving energy efficiency. > > > > This naturally raises the question "Why is this code in RCU rather than in > > timekeeping?", and the answer is that RCU has the data and infrastructure > > to efficiently make this determination. > > but but but but... why doesn't the regular nohz code qualify? I'd think > that too would be tracking pretty much the same things, no?
The regular nohz code is identifying which CPUs are idle, but is doing so on a CPU-by-CPU basis. Before turning off system-wide timekeeping, we need to know that -all- of the CPUs are idle. The regular nohz code does not do this. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/