On Mon, Apr 09, 2007 at 01:40:57PM -0700, Andrew Morton wrote: > On Mon, 9 Apr 2007 11:08:53 -0700 > "Siddha, Suresh B" <[EMAIL PROTECTED]> wrote: > > > Align the per cpu runqueue to the cacheline boundary. This will minimize the > > number of cachelines touched during remote wakeup. > > > > Signed-off-by: Suresh Siddha <[EMAIL PROTECTED]> > > --- > > > > diff --git a/kernel/sched.c b/kernel/sched.c > > index b9a6837..eca33c5 100644 > > --- a/kernel/sched.c > > +++ b/kernel/sched.c > > @@ -278,7 +278,7 @@ struct rq { > > struct lock_class_key rq_lock_key; > > }; > > > > -static DEFINE_PER_CPU(struct rq, runqueues); > > +static DEFINE_PER_CPU(struct rq, runqueues) ____cacheline_aligned_in_smp; > > Remember that this can consume up to (linesize-4 * NR_CPUS) bytes, which is > rather a lot. > > Remember also that the linesize on VSMP is 4k. > > And that putting a gap in the per-cpu memory like this will reduce its > overall cache-friendliness. >
The internode line size yes. But Suresh is using ____cacheline_aligned_in_smp, which uses SMP_CACHE_BYTES (L1_CACHE_BYTES). So this does not align the per-cpu variable to 4k. However, if the motivation for this patch was significant performance difference, then, the above padding needs to be on the internode cacheline size using ____cacheline_internodealigned_in_smp. ____cacheline_internodealigned_in_smp aligns a data structure to the internode line size, which is 4k for vSMPowered systems and L1 line size for all other architectures. As for the (linesize-4 * NR_CPUS) wastage, maybe we can place the cacheline aligned per-cpu data in another section, just like we do with .data.cacheline_aligned section, but keep this new section between __percpu_start and __percpu_end? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/