On Tue, 5 Apr 2016 17:00:36 +0200
Peter Zijlstra <pet...@infradead.org> wrote:

> On Fri, Apr 01, 2016 at 05:12:29PM +0200, Luca Abeni wrote:
> > +static void task_go_inactive(struct task_struct *p)
> > +{
> > +   struct sched_dl_entity *dl_se = &p->dl;
> > +   struct hrtimer *timer = &dl_se->inactive_timer;
> > +   struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> > +   struct rq *rq = rq_of_dl_rq(dl_rq);
> > +   ktime_t now, act;
> > +   s64 delta;
> > +   u64 zerolag_time;
> > +
> > +   WARN_ON(dl_se->dl_runtime == 0);
> > +
> > +   /* If the inactive timer is already armed, return immediately */
> > +   if (hrtimer_active(&dl_se->inactive_timer))
> > +           return;
> 
> So while we start the timer on the local cpu, we don't migrate the timer
> when we migrate the task, so the callback can happen on a remote cpu,
> right?
> 
> Therefore, the timer function might still be running, but just have done
> task_rq_unlock(), which would have allowed our cpu to acquire the
> rq->lock and get here.
> 
> Then the above check is true, we'll quit, but effectively the inactive
> timer will not run 'again'.
Uhm... So the problem is:
- Task T wakes up, but cannot cancel its inactive timer, because it is running
        + This should not be a problem: inactive_task_timer() will return 
without
          doing anything
- Before inactive_task_timer() can actually run, task T migrates to a different 
CPU
- Befere the timer finishes to run, the task blocks again... So, 
task_go_inactive()
  sees the timer as active and returns immediately. But the timer has already
  executed (without doing anything). So noone decreases the rq utilisation.

I did not think about this issue, and I never managed to trigger it in my
tests... I'll try to see how it can be addressed. Do you have any suggestions?

[...]
> > @@ -1071,6 +1164,23 @@ select_task_rq_dl(struct task_struct *p, int cpu, 
> > int sd_flag, int flags)
> >     }
> >     rcu_read_unlock();
> >  
> > +   if (rq != cpu_rq(cpu)) {
> 
> I don't think this is right, you want:
> 
>       if (task_cpu(p) != cpu) {
> 
> because @cpu does not need to be task_cpu().
Uhm... I must have misunderstood something in the code, then :(
What I want to do here is to check if select_task_rq_dl() selected
a new CPU for this task... Since at the beginning of the function
rq is set as
        rq = cpu_rq(cpu);
I was thinkint about checking if this is still true (if not, it
means that the value of "cpu" changed).

I'll look at it again.


> 
> > +           int migrate_active;
> > +
> > +           raw_spin_lock(&rq->lock);
> 
> Which then also means @rq is 'wrong', so you'll have to add:
> 
>               rq = task_rq(p);
Ok; I completely misunderstood the current code, then... :(


> 
> before this.
> 
> > +           migrate_active = hrtimer_active(&p->dl.inactive_timer);
> > +           if (migrate_active)
> > +                   sub_running_bw(&p->dl, &rq->dl);
> > +           raw_spin_unlock(&rq->lock);
> 
> At this point task_rq() is still the above rq, so if the inactive timer
> hits here it will lock this rq and subtract the running bw here _again_,
> right?
I think it will see the task state as TASK_RUNNING, so it will do nothing.
Or it will cancelled later when the task is enqueued... I'll double check this.



                        Thanks,
                                Luca

> 
> > +           if (migrate_active) {
> > +                   rq = cpu_rq(cpu);
> > +                   raw_spin_lock(&rq->lock);
> > +                   add_running_bw(&p->dl, &rq->dl);
> > +                   raw_spin_unlock(&rq->lock);
> > +           }
> > +   }

Reply via email to