On Tue, Jul 30, 2019 at 11:36:17AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 22, 2019 at 01:33:43PM -0400, Rik van Riel wrote:

> > +static bool
> > +enqueue_entity_groups(struct cfs_rq *cfs_rq, struct sched_entity *se, int 
> > flags)
> > +{
> > +   /*
> > +    * When enqueuing a sched_entity, we must:
> > +    *   - Update loads to have both entity and cfs_rq synced with now.
> > +    *   - Add its load to cfs_rq->runnable_avg
> > +    *   - For group_entity, update its weight to reflect the new share of
> > +    *     its group cfs_rq
> > +    *   - Add its new weight to cfs_rq->load.weight
> > +    */
> > +   if (!update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH))
> > +           return false;
> > +
> > +   update_cfs_group(se);
> > +   return true;
> > +}

> No functional, but you did make update_cfs_group() conditional. Now that
> looks OK, but maybe you can do that part in a separate patch with a
> little justification of its own.

To record (and extend) our discussion from IRC yesterday; I now do think
the above is in fact a problem.

The thing is that update_cfs_group() does not soly rely on the tg state;
it also contains magic to deal with ramp up; for which you later
introduce that max_h_load thing.

Specifically (re)read the second part of the comment describing
calc_group_shares() where it explains the ramp up:

 * The problem with it is that because the average is slow -- it was designed
 * to be exactly that of course -- this leads to transients in boundary
 * conditions. In specific, the case where the group was idle and we start the
 * one task. It takes time for our CPU's grq->avg.load_avg to build up,
 * yielding bad latency etc..

 (and further)

So by not always calling this (and not updating h_load) you fail to take
advantage of this.

So I would suggest keeping update_cfs_group() unconditional, and
recomputing the h_load for the entire hierarchy on enqueue.

Reply via email to