On Thursday 31 July 2008 21:27, Mel Gorman wrote: > On (31/07/08 16:26), Nick Piggin didst pronounce:
> > I imagine it should be, unless you're using a CPU with seperate TLBs for > > small and huge pages, and your large data set is mapped with huge pages, > > in which case you might now introduce *new* TLB contention between the > > stack and the dataset :) > > Yes, this can happen particularly on older CPUs. For example, on my > crash-test laptop the Pentium III there reports > > TLB and cache info: > 01: Instruction TLB: 4KB pages, 4-way set assoc, 32 entries > 02: Instruction TLB: 4MB pages, 4-way set assoc, 2 entries Oh? Newer CPUs tend to have unified TLBs? > > Also, interestingly I have actually seen some CPUs whos memory operations > > get significantly slower when operating on large pages than small (in the > > case when there is full TLB coverage for both sizes). This would make > > sense if the CPU only implements a fast L1 TLB for small pages. > > It's also possible there is a micro-TLB involved that only support small > pages. That is the case on a couple of contemporary CPUs I've tested with (although granted they are engineering samples, but I don't expect that to be the cause) > > So for the vast majority of workloads, where stacks are relatively small > > (or slowly changing), and relatively hot, I suspect this could easily > > have no benefit at best and slowdowns at worst. > > I wouldn't expect an application with small stacks to request its stack > to be backed by hugepages either. Ideally, it would be enabled because a > large enough number of DTLB misses were found to be in the stack > although catching this sort of data is tricky. Sure, as I said, I have nothing against this functionality just because it has the possibility to cause a regression. I was just pointing out there are a few possibilities there, so it will take a particular type of app to take advantage of it. Ie. it is not something you would ever just enable "just in case the stack starts thrashing the TLB". > > But I'm not saying that as a reason not to merge it -- this is no > > different from any other hugepage allocations and as usual they have to > > be used selectively where they help.... I just wonder exactly where huge > > stacks will help. > > Benchmark wise, SPECcpu and SPEComp have stack-dependent benchmarks. > Computations that partition problems with recursion I would expect to > benefit as well as some JVMs that heavily use the stack (see how many docs > suggest setting ulimit -s unlimited). Bit out there, but stack-based > languages would stand to gain by this. The potential gap is for threaded > apps as there will be stacks that are not the "main" stack. Backing those > with hugepages depends on how they are allocated (malloc, it's easy, > MAP_ANONYMOUS not so much). Oh good, then there should be lots of possibilities to demonstrate it. Thanks, Nick _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev