On Mon, Apr 09, 2007 at 02:53:09PM -0700, Ravikiran G Thirumalai wrote: > On Mon, Apr 09, 2007 at 01:40:57PM -0700, Andrew Morton wrote: > > On Mon, 9 Apr 2007 11:08:53 -0700 > > "Siddha, Suresh B" <[EMAIL PROTECTED]> wrote: > > > -static DEFINE_PER_CPU(struct rq, runqueues); > > > +static DEFINE_PER_CPU(struct rq, runqueues) ____cacheline_aligned_in_smp; > > > > Remember that this can consume up to (linesize-4 * NR_CPUS) bytes, which is > > rather a lot.
Atleast on x86_64, this depends on cpu_possible_map and not NR_CPUS. > > > > Remember also that the linesize on VSMP is 4k. > > > > And that putting a gap in the per-cpu memory like this will reduce its > > overall cache-friendliness. > > > > The internode line size yes. But Suresh is using > ____cacheline_aligned_in_smp, > which uses SMP_CACHE_BYTES (L1_CACHE_BYTES). So this does not align the > per-cpu variable to 4k. However, if the motivation for this patch was > significant performance difference, then, the above padding needs to be on > the internode cacheline size using ____cacheline_internodealigned_in_smp. I see a 0.5% perf improvement on database workload(which is a good improvement for this workload). This patch is minimizing number of cache lines that it touches during a remote task wakeup. Kiran, can you educate me when I am supposed to use ____cacheline_aligned_in_smp Vs __cacheline_aligned_in_smp ? > As for the (linesize-4 * NR_CPUS) wastage, maybe we can place the cacheline > aligned per-cpu data in another section, just like we do with > .data.cacheline_aligned section, but keep this new section between > __percpu_start and __percpu_end? Yes. But that will still waste some memory in the new section, if the data elements are not multiples of 4k. thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

