rt: Simplify the IPI rt balancing logic

Steven Rostedt Thu, 04 May 2017 10:26:09 -0700

On Thu, 4 May 2017 17:32:56 +0200
Peter Zijlstra <[email protected]> wrote:


> On Mon, Apr 24, 2017 at 11:47:32AM -0400, Steven Rostedt wrote:
> >  static int rto_next_cpu(struct rq *rq)
> >  {
> >     int cpu;
> >  
> >     /*
> > +    * When starting the IPI RT pushing, the rto_cpu is set to nr_cpu_ids
> > +    * or greater. rt_next_cpu() will simply return the first CPU found in
> > +    * the rto_mask.
> > +    *
> > +    * If rto_next_cpu() is called with rto_cpu less than nr_cpu_ids, it
> > +    * will return the next CPU found in the rto_mask.
> > +    *
> > +    * If there are no more CPUs left in the rto_mask, then a check is made
> > +    * against rto_loop and rto_loop_next. rto_loop is only updated with
> > +    * the rto_lock held, but any CPU may increment the rto_loop_next
> > +    * without any locking.
> >      */
> > +again:
> > +   if (rq->rd->rto_cpu >= nr_cpu_ids) {
> >             cpu = cpumask_first(rq->rd->rto_mask);
> > +           rq->rd->rto_cpu = cpu;
> > +           /* If cpu is nr_cpu_ids, then there is no overloaded rqs */
> > +           return cpu;
> >     }
> >  
> > +   cpu = cpumask_next(rq->rd->rto_cpu, rq->rd->rto_mask);
> > +   rq->rd->rto_cpu = cpu;
> >  
> > +   if (cpu < nr_cpu_ids)
> > +           return cpu;
> >  
> > +   if (rq->rd->rto_loop == atomic_read(&rq->rd->rto_loop_next))
> > +           return cpu;
> >  
> > +   rq->rd->rto_loop = atomic_read(&rq->rd->rto_loop_next);
> > +   goto again;
> > +}  
> 
> I think you want to write that as:
> 
>       struct root_domain *rd = rq->rd;
>       int cpu, next;
> 
>       /* comment */
>       for (;;) { 
>               if (rd->rto_cpu >= nr_cpu_ids) {

If we go with your change, then this needs to be:

                if (rd->rto_cpu < 0) {

>                       cpu = cpumask_first(rd->rto_mask);
>                       rd->rto_cpu = cpu;
>                       return cpu;
>               }
> 
>               cpu = cpumask_next(rd->rto_mask);

cpumask_next() requires two parameters.

>               rd->rto_cpu = cpu;
> 
>               if (cpu < nr_cpu_ids)
>                       break;
> 
> //            rd->rto_cpu = -1;
> 
>               /*
>                * ACQUIRE ensures we see the @rto_mask changes
>                * made prior to the @next value observed.
>                * 
>                * Matches WMB in rt_set_overload().
>                */
>               next = atomic_read_acquire(&rd->rto_loop_next);
> 
>               if (rd->rto_loop == next)
>                       break;
> 
>               rd->rto_loop = next;
>       }
> 
>       return cpu;
> 
> And I don't fully understand the whole rto_cpu >= nr_cpus_ids thing,
> can't you simply reset the thing to -1 and always use cpumask_next()?
> As per the // comment above?
> 
> > +static inline bool rto_start_trylock(atomic_t *v)
> > +{
> > +   return !atomic_cmpxchg(v, 0, 1);  
> 
> Arguably this could be: !atomic_cmpxchg_acquire(v, 0, 1);

Yes agreed. But if you remember, at the time I was basing this off of
tip/sched/core, which didn't have atomic_cmpxchg_acquire() available.

> 
> >  }
> >  
> > +static inline void rto_start_unlock(atomic_t *v)
> > +{
> > +   atomic_set_release(v, 0);
> > +}
> >    
> 
> >  static void tell_cpu_to_push(struct rq *rq)
> >  {
> > +   int cpu = nr_cpu_ids;
> >  
> > +   /* Keep the loop going if the IPI is currently active */
> > +   atomic_inc_return(&rq->rd->rto_loop_next);  
> 
> Since rt_set_overload() already provides a WMB, we don't need an
> ordered primitive here and atomic_inc() is fine.

Agree, I mentioned this in my previous reply. It was leftover from
previous versions of the patch. I believe I also needed a memory
barrier with this and the check for rto_loop_start. Can't remember if
that was the case, but it doesn't matter now as loop_start is now
updated with a cmpxchg.

> 
> >  
> > +   /* Only one CPU can initiate a loop at a time */
> > +   if (!rto_start_trylock(&rq->rd->rto_loop_start))
> >             return;
> >  
> > +   raw_spin_lock(&rq->rd->rto_lock);
> > +
> > +   /*
> > +    * The rto_cpu is updated under the lock, if it has a valid cpu
> > +    * then the IPI is still running and will continue due to the
> > +    * update to loop_next, and nothing needs to be done here.
> > +    * Otherwise it is finishing up and an ipi needs to be sent.
> > +    */
> > +   if (rq->rd->rto_cpu >= nr_cpu_ids)  
> //    if (rq->rd->rto_cpu < 0)

This can be done, I was just being a bit more conservative and having
rto_cpu have less states (valid CPU or nr_cpu_ids). With a -1, we have
to manually set it to that. But I'm fine with doing it that way too.

This went through several iterations. There were times where using a -1
wasn't so simple.

> 
> > +           cpu = rto_next_cpu(rq);
> >  
> > +   raw_spin_unlock(&rq->rd->rto_lock);
> > +
> > +   rto_start_unlock(&rq->rd->rto_loop_start);
> > +
> > +   if (cpu < nr_cpu_ids)
> > +           irq_work_queue_on(&rq->rd->rto_push_work, cpu);
> >  }
> >  
> >  /* Called from hardirq context */
> > +void rto_push_irq_work_func(struct irq_work *work)
> >  {
> > +   struct rq *rq;
> >     int this_cpu;
> >     int cpu;
> >  
> > +   this_cpu = smp_processor_id();
> >     rq = cpu_rq(this_cpu);  
> 
>       rq = this_rq();

Heh, sure. I guess I was just keeping it with the previous logic.

Thanks for the review. I'll spin up a new patch. Unfortunately, I no
longer have access to the behemoth machine. I'll only be testing this
on 4 cores now, or 8 with HT.

-- Steve


> 
> >  
> > +   /*
> > +    * We do not need to grab the lock to check for has_pushable_tasks.
> > +    * When it gets updated, a check is made if a push is possible.
> > +    */
> >     if (has_pushable_tasks(rq)) {
> >             raw_spin_lock(&rq->lock);
> > +           push_rt_tasks(rq);
> >             raw_spin_unlock(&rq->lock);
> >     }
> >  
> > +   raw_spin_lock(&rq->rd->rto_lock);
> >  
> > +   /* Pass the IPI to the next rt overloaded queue */
> > +   cpu = rto_next_cpu(rq);
> >  
> > +   raw_spin_unlock(&rq->rd->rto_lock);
> >  
> >     if (cpu >= nr_cpu_ids)
> >             return;
> >  
> >     /* Try the next RT overloaded CPU */
> > +   irq_work_queue_on(&rq->rd->rto_push_work, cpu);
> >  }

Re: [PATCH tip/sched/core v2] sched/rt: Simplify the IPI rt balancing logic

Reply via email to