On Thu, 2007-06-21 at 22:37 +0200, Roland Mainz wrote: > Michael Corcoran wrote: > > On Thu, 2007-06-14 at 09:12 -0500, Dave Marquardt wrote: > > > "Roland" == Roland Mainz <roland.mainz at nrubsig.org> writes: > > > > > > Roland> Dave Marquardt wrote: > > > >> > > > >> "dsc" == David Comay <David.Comay at Sun.COM> writes: > > > >> > > > dsc> Here are my comments for round "three": > > > >> > > > dsc> usr/src/cmd/ksh/Makefile.com > > > >> > > > dsc> Lines 101-109 - As I indicated in an earlier review, I don't > > > dsc> believe this is necessary. Both Nevada and the Solaris 10 > > > dsc> patch gate do large pages automatically (or so-called out of > > > dsc> the box) and so including these options is unnecessary. > > > dsc> However, I've cc'ed Bart Smaalders who is an expert in this > > > dsc> area who can suggest whether or not it makes sense to include > > > dsc> this. > > > >> > > > >> Just to be clear, this issue is about using -xpagesize_heap=64K and > > > >> -xpagesize_stack=64K on SPARC. > > > > > > Roland> Right... > > > > > Just to be even clearer, is this for all SPARC machines or just > > UltraSparc I and II machines? > > Erm... yes, it's for all SPARC machines. > > > The TLB architecture on US-III+, US-IV, US-IV+ tends not to use 64K page > > sizes often due to the restriction of having essentially 2 pagesizes > > within a process that work well together. US-III may not work well with > > 2 page sizes though and thus we default to 8k in sun4u/cpu/us3_cheetah.c > > cpu_fiximp. > > Erm... doesn't UltraSPARC >= 3cu have three pagetables which can handle > one specific page size each (with the restriction that the current > Solaris _implementation_ restricts this to 8k and 4M pages (which is IMO > sub-optimal since the majority of smaller applications cannot make much > use of 4M or 512k pages - for example the Xserver would run a lot better > when offscreen surfaces would be mapped with 64k pages) ... and another > good example are gzip, bzip2 and libz.so.1 where the use of 64k pages > shows a good performance improvement even on a Blade1000 which cannot > handle 64k pages that well...) ?
The US-3+ to US-4+ cpus have 3 tlbs, 1 16 entry fully associative, a 2 512 entry tlbs (this is from memory so some of it might be off slightly). The 2 large TLBs can be programmed to hold a single page size for a given process. On some of the cpus, I think the first large TLB has to be programmed to 8K (or is hardcoded this way) and the second one is available for any pagesize. The 16 entry tlb has a number of restrictions which make it of limited use. Locked entries can only go in the small tlb. Thus a number of entries are permanently taken up at boot time. 1 for kernel text, 1 for kernel data, 1 for obp, 4 for the kernel TSB. Then we also need to lock TSBs for our process which can lock 2 more entries. Thus, 8 or 9 entries will always be locked in the TLB. On processors which support the quad load phys ASI (I think this is us4 and later) we no longer need to lock the 4 kernel TSB entries since we can access them via physical address. There are a few other locations which occasionally use locked mappings as well which will short term use up some of the 16 available entries. So, the number of entries left over is relatively small. It's not really a working set size of 16 entries anymore. So, if a process uses more than 2 pages sizes, the third size has to go into the small TLB and will get likely kicked out quickly. Since there are large TLBs for 8K pages, if we use 8K instead of 64K pages, they are likely to stay around longer. In terms of the machine you're working on and how LPOOB performs, do you have at least 1G of memory in that machine? If not, my last email pointed to some comments stating that if you have less than 1G on a USII machine then LPOOB is disabled. Thanks, Mike > And AFAIK there is a 16entry table which can handle any supported page > size - which at least makes the use of 64k pages useable for the stack > (assuming the users don't consume lots of memory and/or don't make large > jobs between multiple pages) since there aren't that many stack pages > (and the usage/access pattern is very different from the heap > usage/access pattern). > > > The Niagara cpus I think can handle all page sizes equally well. > > Right... AFAIK it's a derivate of the Spitfire MMU and doesn't have any > restrictions like that. > > ---- > > Bye, > Roland >