> -----Original Message----- > From: Jack Steiner [mailto:[EMAIL PROTECTED] > Sent: 2007年3月28日 9:53 > To: Zou, Nanhai > Cc: Luck, Tony; Linux-IA64 > Subject: Re: [PATCH] - Optional method to purge the TLB on SN systems > > On Wed, Mar 28, 2007 at 08:46:44AM +0800, Zou Nan hai wrote: > > On Wed, 2007-03-28 at 03:39, Jack Steiner wrote: > > > > > This patch adds an optional method for purging the TLB on SN IA64 systems. > > > The change should not affect any non-SN system. > > > > > > Signed-off-by: Jack Steiner <[EMAIL PROTECTED]> > > > > > > --- > > > > > > +void > > > +smp_flush_tlb_cpumask (cpumask_t xcpumask) > > > +{ > > > + unsigned short counts[NR_CPUS]; > > > + cpumask_t cpumask = xcpumask; > > > + int count, mycpu, cpu, flush_mycpu = 0; > > > + > > > + preempt_disable(); > > > + mycpu = smp_processor_id(); > > > + > > > + for_each_cpu_mask(cpu, cpumask) { > > > + counts[cpu] = per_cpu(local_flush_count, cpu); > > > + mb(); > > > + if (cpu == mycpu) > > > + flush_mycpu = 1; > > > + else > > > + smp_send_local_flush_tlb(cpu); > > > + } > > > + > > > + if (flush_mycpu) > > > + smp_local_flush_tlb(); > > > + > > > + for_each_cpu_mask(cpu, cpumask) { > > > + count = 0; > > > + while(counts[cpu] == per_cpu(local_flush_count, cpu)) { > > > > Due to 64k offset of percpu data, the same percpu variable on different > > CPUs are very likely to be on the same cacheline of some levels of > > cache. > > > > So I think the operation on local_flush_count may be very cache > > unfriendly... > > I was concerned about that, too, but testing finally convinced me that > it was not an issue. I think the reason is that is takes a few hundred > nanoseconds per cpu to send an IPI. So rather than a contended cache > line, we have a line that is serially read by multiple cpus. Although > contention can occur, typically multiple cpus are not trying to read > the same line at the same time. > > For example (oversimplified), IPI sent to cpu 0 at time 0, to cpu 1 at > time ~100, cpu 2 at time ~200, etc. The IPI requires a chipset access > that takes order-of-memory-access time. Assume it take N usec for a > cpu to recognize the IPI & call the TLB flushing code. Cpu 0 reads > local_flush_count at time N, cpu reads local_flush_count at time > 100+N, etc. Very little contention, just serial access. > > -- > > I tried a second algorithm where the local_flush_count was kept in > node-local percpu data. That scheme was significantly slower. Most > likely because the cpu that initiates the flush will take N (# of > cpus) cache misses to get an initial snapshot of the counts, then > another N cache misses to check for completion. This assumes that > a cpu doing a flush is not the most-recent cpu to do a flush. > I believe this is typical. > > Keeping the counts in a single array (64cpus/cache line) > significantly reduces the number of cache misses.
> > Another disadvantage of keeping counts in per-cpu data is that > scanning the counts trashes the TLB for large NR_CPUS. The counts will > be located in different 16MB granules. Each reference to cpu's percpu > data will require a different TLB entry to map the address used to > reference the count. To scan N cpus, there will be ~2*N TLB misses > plus at the end of the flush, the contents of the TLB are useless > for most kernel or user use. > > -- > > I tried a third algorithm where the counts were kept in a single array > but each count was cacheline aligned to eliminate any possibility > of contention. This was better that the second method that trashed > the TLB. 1 TLB entry will cover the entire array. Unfortunately, > this algorithm still encurs 2*N cache misses & is slower than > the current algorithm. > > > Does this explanation make sense...... If anyone has an alternate > algorithm, I be glad to try it. Yes, put count in a tight array could be better. But your original patch is using the second algorithm? Zou Nan hai > > > -- jack - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
