On Sun, May 08, 2016 at 10:08:55AM +0200, Mike Galbraith wrote:
> > Maybe give the criteria a bit margin, not just wakees tend to equal 
> > llc_size,
> > but the numbers are so wild to easily break the fragile condition, like:
> 
> Seems lockless traversal and averages just lets multiple CPUs select
> the same spot.  An atomic reservation (feature) when looking for an
> idle spot (also for fork) might fix it up.  Run the thing as RT,
> push/pull ensures that it reaches box saturation regardless of the
> number of messaging threads, whereas with fair class, any number > 1
> will certainly stack tasks before the box is saturated.

Yes, good idea, bringing order to the race to grab idle CPU is absolutely
helpful.

In addition, I would argue maybe beefing up idle balancing is a more
productive way to spread load, as work-stealing just does what needs
to be done. And seems it has been (sub-unconsciously) neglected in this
case, :)

Regarding wake_wide(), it seems the M:N is 1:24, not 6:6*24, if so,
the slave will be 0 forever (as last_wakee is never flipped).

Basically whenever a waker has more than 1 wakee, the wakee_flips
will comfortably grow very large (with last_wakee alternating),
whereas when a waker has 0 or 1 wakee, the wakee_flips will just be 0.

So recording only the last_wakee seems not right unless you have other
good reason. If not the latter, counting waking wakee times should be
better, and then allow the statistics to happily play.

Reply via email to