On Mon, May 09, 2016 at 05:45:40AM +0200, Mike Galbraith wrote:
> On Mon, 2016-05-09 at 02:57 +0800, Yuyang Du wrote:
> > On Sun, May 08, 2016 at 10:08:55AM +0200, Mike Galbraith wrote:
> > > > Maybe give the criteria a bit margin, not just wakees tend to equal 
> > > > llc_size,
> > > > but the numbers are so wild to easily break the fragile condition, like:
> > > 
> > > Seems lockless traversal and averages just lets multiple CPUs select
> > > the same spot.  An atomic reservation (feature) when looking for an
> > > idle spot (also for fork) might fix it up.  Run the thing as RT,
> > > push/pull ensures that it reaches box saturation regardless of the
> > > number of messaging threads, whereas with fair class, any number > 1
> > > will certainly stack tasks before the box is saturated.
> > 
> > Yes, good idea, bringing order to the race to grab idle CPU is absolutely
> > helpful.
> 
> Well, good ideas work, as yet this one helps jack diddly spit.

Then a valid question is whether it is this selection screwed up in case
like this, as it should necessarily always be asked.
 
> > In addition, I would argue maybe beefing up idle balancing is a more
> > productive way to spread load, as work-stealing just does what needs
> > to be done. And seems it has been (sub-unconsciously) neglected in this
> > case, :)
> > 
> > Regarding wake_wide(), it seems the M:N is 1:24, not 6:6*24, if so,
> > the slave will be 0 forever (as last_wakee is never flipped).
> 
> Yeah, it's irrelevant here, this load is all about instantaneous state.
>  I could use a bit more of that, reserving on the wakeup side won't
> help this benchmark until everything else cares.  One stack, and it's
> game over.  It could help generic utilization and latency some.. but it
> seems kinda unlikely it'll be worth the cycle expenditure.

Yes and no, it depends on how efficient work-stealing is, compared to
selection, but remember, at the end of the day, the wakee CPU measures the
latency, that CPU does not care it is selected or it steals.
 
> > Basically whenever a waker has more than 1 wakee, the wakee_flips
> > will comfortably grow very large (with last_wakee alternating),
> > whereas when a waker has 0 or 1 wakee, the wakee_flips will just be 0.
> 
> Yup, it is a heuristic, and like all of those, imperfect.  I've watched
> it improving utilization in the wild though, so won't mind that until I
> catch it doing really bad things.
 
> > So recording only the last_wakee seems not right unless you have other
> > good reason. If not the latter, counting waking wakee times should be
> > better, and then allow the statistics to happily play.

En... should we try remove recording last_wakee?

Reply via email to