"Roland" == Roland Mainz <[EMAIL PROTECTED]> writes:

Roland> Dave Marquardt wrote:
>> 
>> "dsc" == David Comay <[EMAIL PROTECTED]> writes:
>> 
dsc> Here are my comments for round "three":
>> 
dsc> usr/src/cmd/ksh/Makefile.com
>> 
dsc> Lines 101-109 - As I indicated in an earlier review, I don't
dsc> believe this is necessary.  Both Nevada and the Solaris 10
dsc> patch gate do large pages automatically (or so-called out of
dsc> the box) and so including these options is unnecessary.
dsc> However, I've cc'ed Bart Smaalders who is an expert in this
dsc> area who can suggest whether or not it makes sense to include
dsc> this.
>> 
>> Just to be clear, this issue is about using -xpagesize_heap=64K and
>> -xpagesize_stack=64K on SPARC.

Roland> Right...

>> Let me just paste the lines from file
>> here:
>> 
>> 101 # Use 64k pages on SPARC (32bit+64bit ; based on benchmarking on an 
>> Ultra5
>> 102 # and a Blade1000 this is the optimum for small/medium-sized datasets 
>> (512k
>> 103 # pages are not available on Niagara CPUs and 4M pages are far too 
>> large)).
>> 104 # (Note that the stack should always be mapped with 64k pages (or 
>> better),
>> 105 # heap is optional. Both heap and stack should use the same stacksize 
>> since
>> 106 # some MMU types cannot handle more than one largepage size efficiently)
>> 107 sparc_CFLAGS   += -_cc=-xpagesize_stack=64K -_cc=-xpagesize_heap=64K
>> 108 sparcv9_CFLAGS += -_cc=-xpagesize_stack=64K -_cc=-xpagesize_heap=64K
>> 
>> This isn't too bad as far as it goes, but there are some potential
>> issues.
>> 
>> First, as David mentioned, with S10U1 and Nevada since about 2 years
>> ago, Solaris has used large pages "out of the box" (LPOOB), i.e. the
>> kernel picks large pages for application heap, stack and other
>> anonymous memory based on the system's TLB architecture.  I was one of
>> the developers of this code and integrated it into Solaris.  This
>> initial code was tuned for UltraSPARC-III/III+/IV/IV+ for sun4u and T1
>> (Niagara 1) for sun4v.  The code has tunables that allow you to tune
>> it differently, but I've not heard of anyone who has done so.

Roland> Well, this is part is horribly underdocumented and has large holes
Roland> filled with hungry komodo dragons (if you "poke" around in the values
Roland> without knowing what the side-effects are...) ...
Roland> ... is there no script/tool which cna be used to get/set such tuneables
Roland> ?

I agree.  The tunables are there for kernel developers and support
folks to tweak should the need arise, and as such, they're
undocumented.

>> More recently, more work was done for LPOOB, but I'm not as familiar
>> with that code.

Roland> Ok.. but even for B61 64k pages are not used by default (or $ ksh93 -c
Roland> 'pmap -s -x $$ ; true' # shows wrong values (note the "true", otherwise
Roland> ksh93 will |exec()| the last command and pmap shows the page map for
Roland> itself)).

So it appears there hasn't been tuning done to help this particular
case.

Roland> [snip]
>> Another issue is that you may still have some number of stray 8K pages
>> due to alignment issues, e.g. the heap doesn't start on a 64K
>> boundary.  It's possible you take care of that with a mapfile, but I
>> haven't reviewed all the Makefiles to see if that's the case.

Roland> Right now we don't use mapfiles for that since we only map stack&&heap
Roland> with 64k pages (AFAIK you're thinking about things like text/data
Roland> segments, right ?). We still have "stray" 8k pages coming from
Roland> allocations before the "-xpagesize_*=64k"-option has an effect - but
Roland> AFAIK we can ignore this since this memory is not used in the "hot
Roland> codepaths".

I'm specifically thinking of the BSS & heap boundary.  Heap starts at
the end of BSS, and last I knew we weren't mapping BSS on large pages.

>> Finally, you've locked yourself into 64K pages, and it's possible the
>> kernel will continue to be improved in the selection of large pages.
>> These improvements may not work if you've already selected 64K pages,
>> or perhaps the selection of 64K pages will interfere with good
>> performance.

Roland> Erm, based on the result we saw the 64k pages are the optimum... and as
Roland> the comment in the Makefile says: 8k pages (the default) are not the
Roland> optimum, 512k pages are too large and not available everywhere and 4M
Roland> and 256M pages are like hunting ducks with a M1 Abrams/TUSK.
Roland> The scenario where you may be right is that if the heap usage grows to a
Roland> size where 4M pages may become more usefull (note: we ship a 64bit
Roland> version of ksh93 for this case... remeber perl and ksh93 are used for
Roland> postprocessing and "glue" for bioinformatics applications where the
Roland> datasets quickly grow beyond 4GB (and yes, the AST memory allocator can
Roland> handle that properly)) then the choice for 64k pages may be sub-optimal
Roland> but I assume the kernel isn't that... uhm... "dumb" and overrides the
Roland> "hint" given by "-xpagesize_*=64k"-option.

Well, if you use -xpagesize_*=64K, we tend to be conservative and
think you mean it!  So, no, the kernel won't override your setting of
64K if your stack or heap grow large.

Roland> But in any case the comment for mapping the stack with 64k
Roland> remains as we did explicit optimisations in this area.

>> I'd suggest you might look at tuning the LPOOB mechanism to handle
>> UltraSPARC-II better,

Roland> And UltraSPARC-I, too - remeber some distributions like MarTux support
Roland> these CPUs (and I wish OpenSolaris would keep support for this because
Roland> there are huge stockpiles of UltraSPARC-1-based machines at many
Roland> universities which could be "donated" to students&co.).

Right, it should be pretty easy to treat these the same way.

>> since that's one of platforms you care about.
>> If you tuned LPOOB on UltraSPARC-II to force the use of 64K pages as
>> soon as the heap and stack were 64K in size or larger, much like is
>> done on sun4v, all programs that have stacks or heaps of that size or
>> larger would benefit, and possibly the whole system would benefit, as
>> it would have fewer TLB misses.

Roland> Right... but that is a general issue with a far lager scope
Roland> than this project... right now we only discuss ksh93 and the
Roland> use of 64k largepages which aims at small/midsized datasets.

Well, as someone who has worked on performance projects in the past at
Sun, I'm also interested in the overall performance of systems.  I
suppose it might be difficult to find a workload where this will hurt.

I'm not convinced the design and code for adding better LPOOB tuning
for UltraSPARC I & II is all that large, particularly compared to all
the work you and others have put into ksh93, but certainly, it's
outside the scope of the ksh93 project, I agree.

>> I don't think Sun is interested in
>> investing in this tuning for US-II, as we don't sell many (any?) US-II
>> systems these days.  But it would be an interesting community project
>> if anyone is interested.

Roland> What about applying the Niagara1 defaults for UltraSPARC-1/2
Roland> CPUs, too ?

That was exactly my thought for the first round of tuning for
UltraSPARC 1&2.

I've opened an RFE, CR 6569725:

6569725 Add better large page out of box support for UltraSPARC I and II

As I said, I doubt Sun management will want to invest in this area due
to low return on investment for Sun, but certainly someone else in the
OpenSolaris community could take it on, or perhaps some Sun employee
in his spare time.
-- 
Dave Marquardt
Sun Microsystems, Inc.
Austin, TX
+1 512 401-1077 (SUN internal: x64077)
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to