"dsc" == David Comay <David.Comay at Sun.COM> writes:

dsc> Here are my comments for round "three":

dsc> usr/src/cmd/ksh/Makefile.com

dsc>    Lines 101-109 - As I indicated in an earlier review, I don't
dsc>    believe this is necessary.  Both Nevada and the Solaris 10
dsc>    patch gate do large pages automatically (or so-called out of
dsc>    the box) and so including these options is unnecessary.
dsc>    However, I've cc'ed Bart Smaalders who is an expert in this
dsc>    area who can suggest whether or not it makes sense to include
dsc>    this.

Just to be clear, this issue is about using -xpagesize_heap=64K and
-xpagesize_stack=64K on SPARC.  Let me just paste the lines from file
here:

101 # Use 64k pages on SPARC (32bit+64bit ; based on benchmarking on an Ultra5
102 # and a Blade1000 this is the optimum for small/medium-sized datasets (512k
103 # pages are not available on Niagara CPUs and 4M pages are far too large)).
104 # (Note that the stack should always be mapped with 64k pages (or better),
105 # heap is optional. Both heap and stack should use the same stacksize since
106 # some MMU types cannot handle more than one largepage size efficiently)
107 sparc_CFLAGS   += -_cc=-xpagesize_stack=64K -_cc=-xpagesize_heap=64K
108 sparcv9_CFLAGS += -_cc=-xpagesize_stack=64K -_cc=-xpagesize_heap=64K

This isn't too bad as far as it goes, but there are some potential
issues.

First, as David mentioned, with S10U1 and Nevada since about 2 years
ago, Solaris has used large pages "out of the box" (LPOOB), i.e. the
kernel picks large pages for application heap, stack and other
anonymous memory based on the system's TLB architecture.  I was one of
the developers of this code and integrated it into Solaris.  This
initial code was tuned for UltraSPARC-III/III+/IV/IV+ for sun4u and T1
(Niagara 1) for sun4v.  The code has tunables that allow you to tune
it differently, but I've not heard of anyone who has done so.

More recently, more work was done for LPOOB, but I'm not as familiar
with that code.

Since we tuned the code for the US-III family for sun4u, we typically
used 8K and 4M pages to take advantage of the 512 entry, single page
size TLB on US-III and not remap the segments very often.  For IV+, we
could also use 32M or 256M, but I believe there use was fairly
limited.  This works on UltraSPARC-II, but can't really take advantage
of the data TLB there, which, if I remember correctly, can handle
different page sizes at the same time, but is only 64 entries.
UltraSPARC-T1 works has a similar TLB.  For sun4v, we tuned LPOOB to
move to the next larger page size as soon as possible, in order to use
as few TLB entries as possible and try to avoid TLB misses and TLB
thrashing.  T1 understands translation storage buffers (TSBs) in
either hardware or the hypervisor (I don't remember which) too, so TLB
misses are a little cheaper, since they don't have to trap into into
Solaris.  But Solaris also has to handle TSB misses.

Another issue is that you may still have some number of stray 8K pages
due to alignment issues, e.g. the heap doesn't start on a 64K
boundary.  It's possible you take care of that with a mapfile, but I
haven't reviewed all the Makefiles to see if that's the case.

Finally, you've locked yourself into 64K pages, and it's possible the
kernel will continue to be improved in the selection of large pages.
These improvements may not work if you've already selected 64K pages,
or perhaps the selection of 64K pages will interfere with good
performance.

I'd suggest you might look at tuning the LPOOB mechanism to handle
UltraSPARC-II better, since that's one of platforms you care about.
If you tuned LPOOB on UltraSPARC-II to force the use of 64K pages as
soon as the heap and stack were 64K in size or larger, much like is
done on sun4v, all programs that have stacks or heaps of that size or
larger would benefit, and possibly the whole system would benefit, as
it would have fewer TLB misses.  I don't think Sun is interested in
investing in this tuning for US-II, as we don't sell many (any?) US-II
systems these days.  But it would be an interesting community project
if anyone is interested.
-- 
Dave Marquardt
Sun Microsystems, Inc.
Austin, TX
+1 512 401-1077 (SUN internal: x64077)

Reply via email to