uclamp: Protect uclamp fast path code with static key

Qais Yousef Tue, 30 Jun 2020 02:46:57 -0700

Hi Patrick

On 06/30/20 10:11, Patrick Bellasi wrote:
> 
> Hi Qais,
> here are some more 2c from me...
> 
> On Mon, Jun 29, 2020 at 18:26:33 +0200, Qais Yousef <[email protected]> 
> wrote...
> 
> [...]
> 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 235b2cae00a0..8d80d6091d86 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -794,6 +794,26 @@ unsigned int sysctl_sched_uclamp_util_max = 
> > SCHED_CAPACITY_SCALE;
> >  /* All clamps are required to be less or equal than these values */
> >  static struct uclamp_se uclamp_default[UCLAMP_CNT];
> >  
> > +/*
> > + * This static key is used to reduce the uclamp overhead in the fast path. 
> > It
> > + * primarily disables the call to uclamp_rq_{inc, dec}() in
> > + * enqueue/dequeue_task().
> > + *
> > + * This allows users to continue to enable uclamp in their kernel config 
> > with
> > + * minimum uclamp overhead in the fast path.
> > + *
> > + * As soon as userspace modifies any of the uclamp knobs, the static key is
> > + * enabled, since we have an actual users that make use of uclamp
> > + * functionality.
> > + *
> > + * The knobs that would enable this static key are:
> > + *
> > + *   * A task modifying its uclamp value with sched_setattr().
> > + *   * An admin modifying the sysctl_sched_uclamp_{min, max} via procfs.
> > + *   * An admin modifying the cgroup cpu.uclamp.{min, max}
> 
> I guess this list can be obtained with a grep or git changelog, moreover
> this text will require maintenance.
> 
> What about replacing this full comment with something shorted like:
> 
> ---8<---
>       Static key to reduce uclamp overhead in the fast path by disabling
>       calls to uclamp_rq_{inc, dec}().
> ---8<---


If you don't mind, I rather more verbose info. As a relatively new comer, lack
of comments about expectation of some functions is still a challenge.

> 
> > + */
> > +DEFINE_STATIC_KEY_FALSE(sched_uclamp_used);
> > +
> >  /* Integer rounded range for each bucket */
> >  #define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, 
> > UCLAMP_BUCKETS)
> >  
> > @@ -994,9 +1014,30 @@ static inline void uclamp_rq_dec_id(struct rq *rq, 
> > struct task_struct *p,
> >     lockdep_assert_held(&rq->lock);
> >  
> >     bucket = &uc_rq->bucket[uc_se->bucket_id];
> > -   SCHED_WARN_ON(!bucket->tasks);
> > +
> > +   /*
> > +    * bucket->tasks could be zero if sched_uclamp_used was enabled while
> > +    * the current task was running, hence we could end up with unbalanced
> > +    * call to uclamp_rq_dec_id().
> > +    *
> > +    * Need to be careful of the following enqeueue/dequeue order
> > +    * problem too
> > +    *
> > +    *      enqueue(taskA)
> > +    *      // sched_uclamp_used gets enabled
> > +    *      enqueue(taskB)
> > +    *      dequeue(taskA)
> > +    *      // bucket->tasks is now 0
> > +    *      dequeue(taskB)
> > +    *
> > +    * where we could end up with uc_se->active of the task set to true and
> > +    * the wrong bucket[uc_se->bucket_id].value.
> > +    *
> > +    * Hence always make sure we reset things properly.
> > +    */
> >     if (likely(bucket->tasks))
> >             bucket->tasks--;
> > +
> >     uc_se->active = false;
> 
> Better than v4, what about just using this active flag?
> 
> ---8<---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8f360326861e..465a7645713b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -990,6 +990,13 @@ static inline void uclamp_rq_dec_id(struct rq *rq, 
> struct task_struct *p,
>  
>         lockdep_assert_held(&rq->lock);
>  
> +       /*
> +        * If a task was already enqueue at uclamp enable time
> +        * nothing has been accounted for it.
> +        */
> +       if (unlikely(!uc_se->active))
> +               return;
> +
>         bucket = &uc_rq->bucket[uc_se->bucket_id];
>         SCHED_WARN_ON(!bucket->tasks);
>         if (likely(bucket->tasks))
> ---8<---
> 
> This will allow also to keep in all the ref count checks we have,
> e.g. the SChed_WARN_ON().

Works for me. Though I'd like to expand on the comment more just because there
were few things that were caught out and worth documenting IMO.

> 
> 
> >     /*
> > @@ -1032,6 +1073,13 @@ static inline void uclamp_rq_inc(struct rq *rq, 
> > struct task_struct *p)
> >  {
> >     enum uclamp_id clamp_id;
> >  
> > +   /*
> > +    * Avoid any overhead until uclamp is actually used by the userspace.
> > +    * Including the branch if we use static_branch_likely()
> 
> I still find this last sentence hard to parse, but perhaps it's just me
> still missing a breakfast :)

It used to be

         * Including the JMP if we use static_branch_likely()

Note s/branch/JMP/

Effectively the condition is written such that we produce a NOP when uclamp is
not used. I'll rephrase.

> 
> > +    */
> > +   if (!static_branch_unlikely(&sched_uclamp_used))
> > +           return;
> 
> I'm also still wondering if the optimization is still working when we
> have that ! in front.

It does. I looked at the generated code before posting.

> 
> Had a check at:
> 
>    
> https://elixir.bootlin.com/linux/latest/source/include/linux/jump_label.h#L399
> 
> and AFAIU, it all boils down to cook a __branch_check()'s compiler hint,
> and ISTR that those are "anti-patterns"?
> 
> That said we do have some usages for this pattern too:
> 
> $ git grep '!static_branch_unlikely' | wc -l       36
> $ git grep 'static_branch_unlikely' | wc -l       220
> 
> ?
> 
> > +
> >     if (unlikely(!p->sched_class->uclamp_enabled))
> >             return;
> >  
> 
> [...]
> 
> > +/**
> > + * uclamp_rq_util_with - clamp @util with @rq and @p effective uclamp 
> > values.
> > + * @rq:            The rq to clamp against. Must not be NULL.
> > + * @util:  The util value to clamp.
> > + * @p:             The task to clamp against. Can be NULL if you want to 
> > clamp
> > + *         against @rq only.
> > + *
> > + * Clamps the passed @util to the max(@rq, @p) effective uclamp values.
> > + *
> > + * If sched_uclamp_used static key is disabled, then just return the util
> > + * without any clamping since uclamp aggregation at the rq level in the 
> > fast
> > + * path is disabled, rendering this operation a NOP.
> > + *
> > + * Use uclamp_eff_value() if you don't care about uclamp values at rq 
> > level. It
> > + * will return the correct effective uclamp value of the task even if the
> > + * static key is disabled.
> 
> Well, if you don't care about rq, you don't call a uclamp_rq_* method.
> 
> I would say that the above paragraph is redundant, moreover it adds some
> cross-reference to a different method (name) which required maintenance.
> 
> What about removing it?

I'd rather keep this one too. It helps explaining what the expected way to use
this code. I don't think the maintenance is a big issue? We have to maintain
the code anyway?

> 
> > + */
> >  static __always_inline
> >  unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
> >                               struct task_struct *p)
> >  {
> > -   unsigned long min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
> > -   unsigned long max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
> > +   unsigned long min_util;
> > +   unsigned long max_util;
> > +
> > +   if (!static_branch_likely(&sched_uclamp_used))
> > +           return util;
> > +
> > +   min_util = READ_ONCE(rq->uclamp[UCLAMP_MIN].value);
> > +   max_util = READ_ONCE(rq->uclamp[UCLAMP_MAX].value);
> 
> I think moving the initialization is not required, the compiler should
> be smart enough to place theme where's better.

I did look at the generated code before posting. The compiler doesn't optimize
the reads. Likely because of the READ_ONCE() which implies volatile access.

> 
> >     if (p) {
> >             min_util = max(min_util, uclamp_eff_value(p, UCLAMP_MIN));
> > @@ -2371,6 +2396,11 @@ unsigned long uclamp_rq_util_with(struct rq *rq, 
> > unsigned long util,
> >  
> >     return clamp(util, min_util, max_util);
> >  }
> > +
> > +static inline bool uclamp_is_enabled(void)
> > +{
> > +   return static_branch_likely(&sched_uclamp_used);
> > +}
> 
> Looks like here we mix up terms, which can be confusing.
> AFAIKS, we use:
> - *_enabled for the sched class flags (compile time)
> - *_used    for the user-space opting in (run time)

I wanted to add a comment here.

I can rename it to uclamp_is_used() if you want.

Thanks

--
Qais Yousef

> 
> Thus, perhaps we can just use the same pattern used by the
> sched_numa_balancing static key:
> 
>   $ git grep sched_numa_balancing
>   kernel/sched/core.c:DEFINE_STATIC_KEY_FALSE(sched_numa_balancing);
>   kernel/sched/core.c:            static_branch_enable(&sched_numa_balancing);
>   kernel/sched/core.c:            
> static_branch_disable(&sched_numa_balancing);
>   kernel/sched/core.c:    int state = 
> static_branch_likely(&sched_numa_balancing);
>   kernel/sched/fair.c:    if (!static_branch_likely(&sched_numa_balancing))
>   kernel/sched/fair.c:    if (!static_branch_likely(&sched_numa_balancing))
>   kernel/sched/fair.c:    if (!static_branch_likely(&sched_numa_balancing))
>   kernel/sched/fair.c:    if (static_branch_unlikely(&sched_numa_balancing))
>   kernel/sched/sched.h:extern struct static_key_false sched_numa_balancing;
> 
> IOW: unconditionally define sched_uclamp_used as non static in core.c,
> and use it directly on schedutil too.
> 
> >  #else /* CONFIG_UCLAMP_TASK */
> >  static inline
> >  unsigned long uclamp_rq_util_with(struct rq *rq, unsigned long util,
> > @@ -2378,6 +2408,11 @@ unsigned long uclamp_rq_util_with(struct rq *rq, 
> > unsigned long util,
> >  {
> >     return util;
> >  }
> > +
> > +static inline bool uclamp_is_enabled(void)
> > +{
> > +   return false;
> > +}
> >  #endif /* CONFIG_UCLAMP_TASK */
> >  
> >  #ifdef arch_scale_freq_capacity
> 
> Best,
> Patrick

Re: [PATCH v5 2/2] sched/uclamp: Protect uclamp fast path code with static key

Reply via email to