Hi,

On 31/08/20 13:07, Lucas Stach wrote:
> When a boosted task gets throttled, what normally happens is that it's
> immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the
> runtime and clears the dl_throttled flag. There is a special case however:
> if the throttling happened on sched-out and the task has been deboosted in
> the meantime, the replenish is skipped as the task will return to its
> normal scheduling class. This leaves the task with the dl_throttled flag
> set.
> 
> Now if the task gets boosted up to the deadline scheduling class again
> while it is sleeping, it's still in the throttled state. The normal wakeup
> however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't
> actually place it on the rq. Thus we end up with a task that is runnable,
> but not actually on the rq and neither a immediate replenishment happens,
> nor is the replenishment timer set up, so the task is stuck in
> forever-throttled limbo.
> 
> Clear the dl_throttled flag before dropping back to the normal scheduling
> class to fix this issue.
> 
> Signed-off-by: Lucas Stach <[email protected]>
> ---
> This is the root cause and fix of the issue described at [1]. After working
> on other stuff for the last few months, I finally was able to circle back
> to this issue and gather the required data to pinpoint the failure mode.
> 
> [1] https://lkml.org/lkml/2020/3/20/765
> ---
>  kernel/sched/deadline.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 3862a28cd05d..c19c1883d695 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1527,12 +1527,15 @@ static void enqueue_task_dl(struct rq *rq, struct 
> task_struct *p, int flags)
>               pi_se = &pi_task->dl;
>       } else if (!dl_prio(p->normal_prio)) {
>               /*
> -              * Special case in which we have a !SCHED_DEADLINE task
> -              * that is going to be deboosted, but exceeds its
> -              * runtime while doing so. No point in replenishing
> -              * it, as it's going to return back to its original
> -              * scheduling class after this.
> +              * Special case in which we have a !SCHED_DEADLINE task that is 
> going
> +              * to be deboosted, but exceeds its runtime while doing so. No 
> point in
> +              * replenishing it, as it's going to return back to its original
> +              * scheduling class after this. If it has been throttled, we 
> need to
> +              * clear the flag, otherwise the task may wake up as throttled 
> after
> +              * being boosted again with no means to replenish the runtime 
> and clear
> +              * the throttle.
>                */
> +             p->dl.dl_throttled = 0;
>               BUG_ON(!p->dl.dl_boosted || flags != ENQUEUE_REPLENISH);
>               return;
>       }

Ah, right, thanks for looking into this issue!

Wonder if we should be calling __dl_clear_params() instead of just
clearing dl_throttled, but what you propose makes sense to me.

Acked-by: Juri Lelli <[email protected]>

Best,

Juri

Reply via email to