"Roland" == Roland Mainz <[EMAIL PROTECTED]> writes: Roland> Dave Marquardt wrote: >> >> "dsc" == David Comay <[EMAIL PROTECTED]> writes: >> dsc> Here are my comments for round "three": >> dsc> usr/src/cmd/ksh/Makefile.com >> dsc> Lines 101-109 - As I indicated in an earlier review, I don't dsc> believe this is necessary. Both Nevada and the Solaris 10 dsc> patch gate do large pages automatically (or so-called out of dsc> the box) and so including these options is unnecessary. dsc> However, I've cc'ed Bart Smaalders who is an expert in this dsc> area who can suggest whether or not it makes sense to include dsc> this. >> >> Just to be clear, this issue is about using -xpagesize_heap=64K and >> -xpagesize_stack=64K on SPARC.
Roland> Right... >> Let me just paste the lines from file >> here: >> >> 101 # Use 64k pages on SPARC (32bit+64bit ; based on benchmarking on an >> Ultra5 >> 102 # and a Blade1000 this is the optimum for small/medium-sized datasets >> (512k >> 103 # pages are not available on Niagara CPUs and 4M pages are far too >> large)). >> 104 # (Note that the stack should always be mapped with 64k pages (or >> better), >> 105 # heap is optional. Both heap and stack should use the same stacksize >> since >> 106 # some MMU types cannot handle more than one largepage size efficiently) >> 107 sparc_CFLAGS += -_cc=-xpagesize_stack=64K -_cc=-xpagesize_heap=64K >> 108 sparcv9_CFLAGS += -_cc=-xpagesize_stack=64K -_cc=-xpagesize_heap=64K >> >> This isn't too bad as far as it goes, but there are some potential >> issues. >> >> First, as David mentioned, with S10U1 and Nevada since about 2 years >> ago, Solaris has used large pages "out of the box" (LPOOB), i.e. the >> kernel picks large pages for application heap, stack and other >> anonymous memory based on the system's TLB architecture. I was one of >> the developers of this code and integrated it into Solaris. This >> initial code was tuned for UltraSPARC-III/III+/IV/IV+ for sun4u and T1 >> (Niagara 1) for sun4v. The code has tunables that allow you to tune >> it differently, but I've not heard of anyone who has done so. Roland> Well, this is part is horribly underdocumented and has large holes Roland> filled with hungry komodo dragons (if you "poke" around in the values Roland> without knowing what the side-effects are...) ... Roland> ... is there no script/tool which cna be used to get/set such tuneables Roland> ? I agree. The tunables are there for kernel developers and support folks to tweak should the need arise, and as such, they're undocumented. >> More recently, more work was done for LPOOB, but I'm not as familiar >> with that code. Roland> Ok.. but even for B61 64k pages are not used by default (or $ ksh93 -c Roland> 'pmap -s -x $$ ; true' # shows wrong values (note the "true", otherwise Roland> ksh93 will |exec()| the last command and pmap shows the page map for Roland> itself)). So it appears there hasn't been tuning done to help this particular case. Roland> [snip] >> Another issue is that you may still have some number of stray 8K pages >> due to alignment issues, e.g. the heap doesn't start on a 64K >> boundary. It's possible you take care of that with a mapfile, but I >> haven't reviewed all the Makefiles to see if that's the case. Roland> Right now we don't use mapfiles for that since we only map stack&&heap Roland> with 64k pages (AFAIK you're thinking about things like text/data Roland> segments, right ?). We still have "stray" 8k pages coming from Roland> allocations before the "-xpagesize_*=64k"-option has an effect - but Roland> AFAIK we can ignore this since this memory is not used in the "hot Roland> codepaths". I'm specifically thinking of the BSS & heap boundary. Heap starts at the end of BSS, and last I knew we weren't mapping BSS on large pages. >> Finally, you've locked yourself into 64K pages, and it's possible the >> kernel will continue to be improved in the selection of large pages. >> These improvements may not work if you've already selected 64K pages, >> or perhaps the selection of 64K pages will interfere with good >> performance. Roland> Erm, based on the result we saw the 64k pages are the optimum... and as Roland> the comment in the Makefile says: 8k pages (the default) are not the Roland> optimum, 512k pages are too large and not available everywhere and 4M Roland> and 256M pages are like hunting ducks with a M1 Abrams/TUSK. Roland> The scenario where you may be right is that if the heap usage grows to a Roland> size where 4M pages may become more usefull (note: we ship a 64bit Roland> version of ksh93 for this case... remeber perl and ksh93 are used for Roland> postprocessing and "glue" for bioinformatics applications where the Roland> datasets quickly grow beyond 4GB (and yes, the AST memory allocator can Roland> handle that properly)) then the choice for 64k pages may be sub-optimal Roland> but I assume the kernel isn't that... uhm... "dumb" and overrides the Roland> "hint" given by "-xpagesize_*=64k"-option. Well, if you use -xpagesize_*=64K, we tend to be conservative and think you mean it! So, no, the kernel won't override your setting of 64K if your stack or heap grow large. Roland> But in any case the comment for mapping the stack with 64k Roland> remains as we did explicit optimisations in this area. >> I'd suggest you might look at tuning the LPOOB mechanism to handle >> UltraSPARC-II better, Roland> And UltraSPARC-I, too - remeber some distributions like MarTux support Roland> these CPUs (and I wish OpenSolaris would keep support for this because Roland> there are huge stockpiles of UltraSPARC-1-based machines at many Roland> universities which could be "donated" to students&co.). Right, it should be pretty easy to treat these the same way. >> since that's one of platforms you care about. >> If you tuned LPOOB on UltraSPARC-II to force the use of 64K pages as >> soon as the heap and stack were 64K in size or larger, much like is >> done on sun4v, all programs that have stacks or heaps of that size or >> larger would benefit, and possibly the whole system would benefit, as >> it would have fewer TLB misses. Roland> Right... but that is a general issue with a far lager scope Roland> than this project... right now we only discuss ksh93 and the Roland> use of 64k largepages which aims at small/midsized datasets. Well, as someone who has worked on performance projects in the past at Sun, I'm also interested in the overall performance of systems. I suppose it might be difficult to find a workload where this will hurt. I'm not convinced the design and code for adding better LPOOB tuning for UltraSPARC I & II is all that large, particularly compared to all the work you and others have put into ksh93, but certainly, it's outside the scope of the ksh93 project, I agree. >> I don't think Sun is interested in >> investing in this tuning for US-II, as we don't sell many (any?) US-II >> systems these days. But it would be an interesting community project >> if anyone is interested. Roland> What about applying the Niagara1 defaults for UltraSPARC-1/2 Roland> CPUs, too ? That was exactly my thought for the first round of tuning for UltraSPARC 1&2. I've opened an RFE, CR 6569725: 6569725 Add better large page out of box support for UltraSPARC I and II As I said, I doubt Sun management will want to invest in this area due to low return on investment for Sun, but certainly someone else in the OpenSolaris community could take it on, or perhaps some Sun employee in his spare time. -- Dave Marquardt Sun Microsystems, Inc. Austin, TX +1 512 401-1077 (SUN internal: x64077) _______________________________________________ opensolaris-code mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/opensolaris-code
