Yuyang Du <[email protected]> writes:
>> > @@ -667,18 +662,17 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, 
>> > struct sched_entity *se)
>> >  #ifdef CONFIG_SMP
>> >  static unsigned long task_h_load(struct task_struct *p);
>> >  
>> > -static inline void __update_task_entity_contrib(struct sched_entity *se);
>> > -
>> >  /* Give new task start runnable values to heavy its load in infant time */
>> >  void init_task_runnable_average(struct task_struct *p)
>> >  {
>> >    u32 slice;
>> > +  struct sched_avg *sa = &p->se.avg;
>> >  
>> > -  p->se.avg.decay_count = 0;
>> > +  sa->last_update_time = 0;
>> > +  sa->period_contrib = 0;
>> 
>> sa->period_contrib = slice;
>
> period_contrib should be strictly < 1024. I suppose sched_slice does not 
> guarantee that.
> So here I will give it 1023 to heavy the load.

Oh right, ignore that then.

>  
>> > +static __always_inline u64 decay_load64(u64 val, u64 n)
>> > +{
>> > +  if (likely(val <= UINT_MAX))
>> > +          val = decay_load(val, n);
>> > +  else {
>> > +          /*
>> > +           * LOAD_AVG_MAX can last ~500ms (=log_2(LOAD_AVG_MAX)*32ms).
>> > +           * Since we have so big runnable load_avg, it is impossible
>> > +           * load_avg has not been updated for such a long time. So
>> > +           * LOAD_AVG_MAX is enough here.
>> > +           */
>> 
>> I mean, LOAD_AVG_MAX is irrelevant - the constant could just as well be
>> 1<<20, or whatever, yes? In fact, if you're going to then turn it into a
>> fraction of 1<<10, just do (with whatever temporaries you find most 
>> tasteful):
>> 
>> val *= (u32) decay_load(1 << 10, n);
>> val >>= 10;
>> 
>
> LOAD_AVG_MAX is selected on purpose. The val arriving here specifies that it 
> is really
> big. So the decay_load may not decay it to 0 even period n is not small. If 
> we use 1<<10
> here, n=10*32 will decay it to 0, but val (larger than 1<<32) can
> survive.

But when you do *1024 / LOAD_AVG_MAX it will go back to zero. In general
the code you have now is exactly equivalent to factor = decay_load(1<<10,n)
ignoring possible differences in rounding.

>> > +/* Update task and its cfs_rq load average */
>> > +static inline void update_load_avg(struct sched_entity *se, int update_tg)
>> >  {
>> > +  struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> > +  u64 now = cfs_rq_clock_task(cfs_rq);
>> > +
>> >    /*
>> > +   * Track task load average for carrying it to new CPU after migrated
>> >     */
>> > +  if (entity_is_task(se))
>> > +          __update_load_avg(now, &se->avg, se->on_rq * se->load.weight);
>> >  
>> > +  update_cfs_rq_load_avg(now, cfs_rq);
>> >  
>> > +  if (update_tg)
>> > +          update_tg_load_avg(cfs_rq);
>> >  }
>> 
>> I personally find this very confusing, in that update_load_avg is doing
>> more to se->cfs_rq, and in fact on anything other than a task, it isn't
>> touching the se at all (instead, it touches _se->parent_ even).
>  
> What is confusing? The naming?

The fact that it sounds like it is operating on three things (se,
cfs_rq_of(se) and se->parent) but looks like it is only operating on se.
In particular, half/most of the time it isn't even operating on se at all.

>
> About the overflow problem, maybe I can just fall back to do load_avg / 47742
> for every update, then everything would be in nature the same range with the
> current code.

Hmm, that might work.

It would still be good to see pipe_test numbers at various cgroup depths.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to