On Tue, Apr 15, 2014 at 12:51 PM, Peter Zijlstra <pet...@infradead.org> wrote: > On Wed, Apr 02, 2014 at 10:17:52PM +0200, Denys Vlasenko wrote: >> When some call site uses get_cpu_*_time_us() to read a sleeptime >> stat, it deduces the total sleeptime by adding the pending time >> to the last sleeptime snapshot if the CPU target is idle. >> >> But this only works if idle_sleeptime, idle_entrytime and idle_active are >> read and updated under some disciplined order. >> >> This patch changes updaters to modify idle_entrytime, >> {idle,iowait}_sleeptime, and idle_active exactly in this order, >> with write barriers on SMP to ensure other CPUs see then in this order too. >> Readers are changed read them in the opposite order, with read barriers. >> When readers detect a race by seeing cleared idle_entrytime, >> they retry the reads. >> >> The "iowait-ness" of every idle period is decided at the moment it starts: >> if this CPU's run-queue had tasks waiting on I/O, then this idle >> period's duration will be added to iowait_sleeptime. >> This, along with proper SMP syncronization, fixes the bug where iowait >> counts could go backwards. > > It also makes for a near infinite source of iowait. Who is to say the > CPU that started with iowait will ever wake up? The nohz sleep time is > practically unbounded.
Yes, it does that. I prepared a new patchset which fixes that. It also addresses your other points: >> /* Updates the per cpu time idle statistics counters */ >> delta = ktime_sub(now, ts->idle_entrytime); >> + ts->idle_entrytime.tv64 = 0; > > One must at all times describe the memory ordering and pairing barrier > in a comment when placing barriers. Done. >> + if (start.tv64 == 0) >> + /* Other CPU is updating the count. >> + * We don't know whether fetched count is valid. >> + */ >> + goto again; > > This is double wrong; any multi line stmt (even if its a single stmt) > should have {}. Also, wrong multi line comment style. Fixed. > You're not nearly lazy enough; this is a near identical copy of the > above, two nearly identical copies of 'tricky' code is a bad idea. True. I was planning to address that, but new logic added to avoid overcounting iowait made these two functions not as similar as they used to be. Please review a newer patchset, I'll send it in a minute. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/