When the nohz idle time is fetched, the current clock timestamp is taken
outside the seqcount, which can result in such a race as reported by
Sashiko:
get_cpu_sleep_time_us() tick_nohz_start_idle()
----------------------- ---------------------
now = ktime_get()
write_seqcount_begin(idle_sleeptime_seq);
idle_entrytime = ktime_get()
tick_sched_flag_set(ts,
TS_FLAG_IDLE_ACTIVE);
write_seqcount_end(&ts->idle_sleeptime_seq);
read_seqcount_begin(idle_sleeptime_seq)
delta = now - idle_entrytime);
//!! But now < idle_entrytime
idle = *sleeptime + delta;
read_seqcount_retry(&ts->idle_sleeptime_seq, seq)
Here the read side fetches the timestamp before the write side and its
update. As a result the time delta computed on the read side is negative
(ktime_t is signed) and breaks the cputime monotonicity guarantee.
This could possibly be fixed with reading the current clock timestamp
inside the seqcount but the reader overhead might then increase. Also
simply checking that the current timestamp is above the idle entry time
is enough to prevent any issue of the like.
Reported-by: Sashiko
Fixes: 620a30fa0bd1 ("timers/nohz: Protect idle/iowait sleep time under
seqcount")
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/tick-sched.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index cbbb87a0c6e7..171393367b5c 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -797,15 +797,16 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts,
ktime_t *sleeptime,
*last_update_time = ktime_to_us(now);
do {
+ ktime_t delta = 0;
+
seq = read_seqcount_begin(&ts->idle_sleeptime_seq);
if (tick_sched_flag_test(ts, TS_FLAG_IDLE_ACTIVE) &&
compute_delta) {
- ktime_t delta = ktime_sub(now, ts->idle_entrytime);
-
- idle = ktime_add(*sleeptime, delta);
- } else {
- idle = *sleeptime;
+ if (now > ts->idle_entrytime)
+ delta = ktime_sub(now, ts->idle_entrytime);
}
+
+ idle = ktime_add(*sleeptime, delta);
} while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq));
return ktime_to_us(idle);
--
2.53.0