On Thu, Nov 05, 2020 at 01:50:19PM -0500, Joel Fernandes wrote:
> On Mon, Oct 26, 2020 at 10:31:31AM +0100, Peter Zijlstra wrote:
> > On Fri, Oct 23, 2020 at 05:31:18PM -0400, Joel Fernandes wrote:
> > > On Fri, Oct 23, 2020 at 09:26:54PM +0200, Peter Zijlstra wrote:
> > 
> > > > How about this then?
> > > 
> > > This does look better. It makes sense and I think it will work. I will 
> > > look
> > > more into it and also test it.
> > 
> > Hummm... Looking at it again I wonder if I can make something like the
> > below work.
> > 
> > (depends on the next patch that pulls core_forceidle into core-wide
> > state)
> > 
> > That would retain the CFS-cgroup optimization as well, for as long as
> > there's no cookies around.
> > 
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4691,8 +4691,6 @@ pick_next_task(struct rq *rq, struct tas
> >             return next;
> >     }
> >  
> > -   put_prev_task_balance(rq, prev, rf);
> > -
> >     smt_mask = cpu_smt_mask(cpu);
> >  
> >     /*
> > @@ -4707,14 +4705,25 @@ pick_next_task(struct rq *rq, struct tas
> >      */
> >     rq->core->core_task_seq++;
> >     need_sync = !!rq->core->core_cookie;
> > -
> > -   /* reset state */
> > -reset:
> > -   rq->core->core_cookie = 0UL;
> >     if (rq->core->core_forceidle) {
> >             need_sync = true;
> >             rq->core->core_forceidle = false;
> >     }
> > +
> > +   if (!need_sync) {
> > +           next = __pick_next_task(rq, prev, rf);
> 
> This could end up triggering pick_next_task_fair's newidle balancing;
> 
> > +           if (!next->core_cookie) {
> > +                   rq->core_pick = NULL;
> > +                   return next;
> > +           }
> 
> .. only to realize here that pick_next_task_fair() that we have to put_prev
> the task back as it has a cookie, but the effect of newidle balancing cannot
> be reverted.
> 
> Would that be a problem as the newly pulled task might be incompatible and
> would have been better to leave it alone?
> 
> TBH, this is a drastic change and we've done a lot of testing with the
> current code and its looking good. I'm a little scared of changing it right
> now and introducing regression. Can we maybe do this after the existing
> patches are upstream?

After sleeping over it, I am trying something like the following. Thoughts?

Basically, I call pick_task() in advance. That's mostly all that's different
with your patch:

---8<-----------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0ce17aa72694..366e5ed84a63 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5000,28 +5000,34 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
struct rq_flags *rf)
        put_prev_task_balance(rq, prev, rf);
 
        smt_mask = cpu_smt_mask(cpu);
-
-       /*
-        * core->core_task_seq, core->core_pick_seq, rq->core_sched_seq
-        *
-        * @task_seq guards the task state ({en,de}queues)
-        * @pick_seq is the @task_seq we did a selection on
-        * @sched_seq is the @pick_seq we scheduled
-        *
-        * However, preemptions can cause multiple picks on the same task set.
-        * 'Fix' this by also increasing @task_seq for every pick.
-        */
-       rq->core->core_task_seq++;
        need_sync = !!rq->core->core_cookie;
 
        /* reset state */
-reset:
        rq->core->core_cookie = 0UL;
        if (rq->core->core_forceidle) {
                need_sync = true;
                fi_before = true;
                rq->core->core_forceidle = false;
        }
+
+       /*
+        * Optimize for common case where this CPU has no cookies
+        * and there are no cookied tasks running on siblings.
+        */
+       if (!need_sync) {
+               for_each_class(class) {
+                       next = class->pick_task(rq);
+                       if (next)
+                               break;
+               }
+
+               if (!next->core_cookie) {
+                       rq->core_pick = NULL;
+                       goto done;
+               }
+               need_sync = true;
+       }
+
        for_each_cpu(i, smt_mask) {
                struct rq *rq_i = cpu_rq(i);
 
@@ -5039,6 +5045,18 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
struct rq_flags *rf)
                }
        }
 
+       /*
+        * core->core_task_seq, core->core_pick_seq, rq->core_sched_seq
+        *
+        * @task_seq guards the task state ({en,de}queues)
+        * @pick_seq is the @task_seq we did a selection on
+        * @sched_seq is the @pick_seq we scheduled
+        *
+        * However, preemptions can cause multiple picks on the same task set.
+        * 'Fix' this by also increasing @task_seq for every pick.
+        */
+       rq->core->core_task_seq++;
+
        /*
         * Try and select tasks for each sibling in decending sched_class
         * order.
@@ -5059,40 +5077,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, 
struct rq_flags *rf)
                         * core.
                         */
                        p = pick_task(rq_i, class, max);
-                       if (!p) {
-                               /*
-                                * If there weren't no cookies; we don't need to
-                                * bother with the other siblings.
-                                */
-                               if (i == cpu && !need_sync)
-                                       goto next_class;
-
+                       if (!p)
                                continue;
-                       }
-
-                       /*
-                        * Optimize the 'normal' case where there aren't any
-                        * cookies and we don't need to sync up.
-                        */
-                       if (i == cpu && !need_sync) {
-                               if (p->core_cookie) {
-                                       /*
-                                        * This optimization is only valid as
-                                        * long as there are no cookies
-                                        * involved. We may have skipped
-                                        * non-empty higher priority classes on
-                                        * siblings, which are empty on this
-                                        * CPU, so start over.
-                                        */
-                                       need_sync = true;
-                                       goto reset;
-                               }
-
-                               next = p;
-                               trace_printk("unconstrained pick: %s/%d %lx\n",
-                                            next->comm, next->pid, 
next->core_cookie);
-                               goto done;
-                       }
 
                        if (!is_task_rq_idle(p))
                                occ++;

Reply via email to