Vmalloc disciplines do not zero out memory. The only explicit zeroing of memory occurs in the call vmresize() and only with the flag VM_RSZERO or in the malloc-compatible calloc() call. Phong
On Fri, Dec 6, 2013 at 10:18 AM, Glenn Fowler <[email protected]>wrote: > On Thu, Dec 5, 2013 at 6:41 PM, Roland Mainz <[email protected]>wrote: > >> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler <[email protected]> >> wrote: >> > On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <[email protected]> >> > wrote: >> >> >> >> On 1 December 2013 17:26, Glenn Fowler <[email protected]> >> wrote: >> >> > I believe this is related to vmalloc changes between 2013-05-31 and >> >> > 2013-06-09 >> >> > re-run the tests with >> >> > export VMALLOC_OPTIONS=getmem=safe >> >> > if that's the problem then it gives a clue on a general solution >> >> > details after confirmation >> >> > >> >> >> >> timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0; >> >> while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf >> >> "%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort <xxx >yyy' >> >> Version AIJMP 93v- 2013-10-08 >> >> >> >> real 34.60 >> >> user 33.27 >> >> sys 1.19 >> >> >> >> VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort { >> >> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ; >> >> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print >> >> "${.sh.version}" ; nanosort <xxx >yyy' >> >> Version AIJMP 93v- 2013-10-08 >> >> real 15.34 >> >> user 14.67 >> >> sys 0.52 >> >> >> >> So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is >> >> correct. >> >> >> >> What does VMALLOC_OPTIONS=getmem=safe do? >> > >> > >> > vmalloc has an internal discipline/method for getting memory from the >> system >> > several methods are available with varying degrees of thread safety etc. >> > see src/lib/libast/vmalloc/vmdcsystem.c for the code >> > and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS >> > description (vmalloc.3 update shortly) >> > >> > ** getmemory=f enable f[,g] getmemory() functions if >> supported, all >> > by default >> > ** anon: mmap(MAP_ANON) >> > ** break|sbrk: sbrk() >> > ** native: native malloc() >> > ** safe: safe sbrk() emulation via >> mmap(MAP_ANON) >> > ** zero: mmap(/dev/zero) >> > >> > i believe the performance regression with "anon" is that on linux >> > mmap(0....MAP_ANON|MAP_PRIVATE...), >> > which lets the system decide the address, returns adjacent (when >> possible) >> > region addresses from highest to lowest order >> > and the reverse order at minimum tends to fragment more memory >> > "zero" has the same hi=>lo characteristic >> > i suspect it adversely affects the vmalloc coalescing algorithm but >> have not >> > dug deeper >> > for now the probe order in vmalloc/vmdcsystem.c was simply changed to >> favor >> > "safe" >> >> Erm... since Irek prodded me by phone I looked at the issue... >> ... some observations first (on Solaris 11/Illumos): >> >> 1. /dev/zero allocator vs. |sbrk()| allocator on Solaris: >> -- snip -- >> $ VMALLOC_OPTIONS=getmem=zero timex ~/bin/ksh -c 'function nanosort { >> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ; >> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print >> "${.sh.version}" ; nanosort <xxx >yyy' >> Version AIJMP 93v- 2013-10-08 >> >> real 32.98 >> user 32.55 >> sys 0.32 >> >> $ VMALLOC_OPTIONS=getmem=break timex ~/bin/ksh -c 'function nanosort { >> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ; >> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print >> "${.sh.version}" ; nanosort <xxx >yyy' >> Version AIJMP 93v- 2013-10-08 >> >> real 1:08.41 >> user 1:07.87 >> sys 0.38 >> -- snip -- >> ... which means the |sbrk()| allocator is twice a slow as the >> /dev/zero allocator. >> > > sbrk is different from safebreak -- look at the vmdcsystem.c code > the alpha will default to not probe just mapped pages for overbooking > this will result in spurious and for the most part untraceable core dumps > on systems running out of memory > > >> 2. The default block size by the normal |mmap(MAP_ANON)| allocator is >> 1MB. This is IMHO far to small because there is IMO not enough space >> for the coalescing algorithm to operate and a *lot* of fragmentation >> occurs. >> IMHO a _minimum_ page size of 4MB should be picked (as a side-effect >> the shell would get 4MB or 2MB largepages on platforms like Solaris >> automagically). >> > > default block size upped to 4Mi and pagesize=<n>[KMGP][i] can ovveride in > VMALLOC_OPTIONS for testing > >> >> 3. After each |mmap(MAP_ANON)| allocation the libast allocator >> "manually" clears the obtained memory chunk with zero bytes. This is >> IMO a *major* source of wasting CPU time (>= ~30%-38% of a >> |_ast_malloc(1024*1024)|) because each memory page is instantiated by >> writing zeros to it. If the clearing could be avoided (which is >> unneccesary anyway) we'd easily win ~30%-38% and do *not* instantiate >> pages which we do not use yet. >> > > can you pinpoint the code that does this -- the only memset(0) i see are > due to explicit VM_RSZERO > > Just to make it clear: Allocating a 1MB chunk of memory via >> |mmap(MAP_ANON)| and a 128MB chunk of memory via |mmap(MAP_ANON)| has >> *no* (visible) difference in performance until we touch the pages via >> either read/execute or write accesses. >> Currently the libast allocator code writes zeros into the whole chunk >> of memory obtained via |mmap(MAP_ANON)| which pretty much ruins >> performance because *all* pages are created physically instead of just >> being some memory marked as "reserved". If libast would stop writing >> into memory chunks directly after the |mmap(MAP_ANON)| we could easily >> bump the allocation size up to 32MB or better without any performance >> penalty... >> >> ---- >> >> Bye, >> Roland >> >> -- >> __ . . __ >> (o.\ \/ /.o) [email protected] >> \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer >> /O /==\ O\ TEL +49 641 3992797 >> (;O/ \/ \O;) >> > >
_______________________________________________ ast-users mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-users
