When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and
everything happens at cfs_rq level. Currently util_est stays unchanged
in such case and it keeps accounting the utilization of throttled tasks.
This can somewhat make sense as we don't dequeue tasks but only throttled
cfs_rq.
If a task of another group is enqueued/dequeued and root cfs_rq becomes
idle during the dequeue, util_est will be cleared whereas it was
accounting util_est of throttled tasks before. So the behavior of util_est
is not always the same regarding throttled tasks and depends of side
activity. Furthermore, util_est will not be updated when the cfs_rq is
unthrottled as everything happens at cfs rq level. Main results is that
util_est will stay null whereas we now have running tasks. We have to wait
for the next dequeue/enqueue of the previously throttled tasks to get an
up to date util_est.

Remove the assumption that cfs_rq's estimated utilization of a CPU is 0
if there is no running task so the util_est of a task remains until the
latter is dequeued even if its cfs_rq has been throttled.

Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")
Signed-off-by: Vincent Guittot <[email protected]>
---
 kernel/sched/fair.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e497c05..d3121fc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3982,18 +3982,10 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct 
task_struct *p, bool task_sleep)
        if (!sched_feat(UTIL_EST))
                return;
 
-       /*
-        * Update root cfs_rq's estimated utilization
-        *
-        * If *p is the last task then the root cfs_rq's estimated utilization
-        * of a CPU is 0 by definition.
-        */
-       ue.enqueued = 0;
-       if (cfs_rq->nr_running) {
-               ue.enqueued  = cfs_rq->avg.util_est.enqueued;
-               ue.enqueued -= min_t(unsigned int, ue.enqueued,
-                                    (_task_util_est(p) | UTIL_AVG_UNCHANGED));
-       }
+       /* Update root cfs_rq's estimated utilization */
+       ue.enqueued  = cfs_rq->avg.util_est.enqueued;
+       ue.enqueued -= min_t(unsigned int, ue.enqueued,
+                            (_task_util_est(p) | UTIL_AVG_UNCHANGED));
        WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued);
 
        /*
-- 
2.7.4

Reply via email to