On Thu, Dec 5, 2013 at 6:41 PM, Roland Mainz <[email protected]>wrote:
> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler <[email protected]> > wrote: > > On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <[email protected]> > > wrote: > >> > >> On 1 December 2013 17:26, Glenn Fowler <[email protected]> > wrote: > >> > I believe this is related to vmalloc changes between 2013-05-31 and > >> > 2013-06-09 > >> > re-run the tests with > >> > export VMALLOC_OPTIONS=getmem=safe > >> > if that's the problem then it gives a clue on a general solution > >> > details after confirmation > >> > > >> > >> timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0; > >> while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf > >> "%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort <xxx >yyy' > >> Version AIJMP 93v- 2013-10-08 > >> > >> real 34.60 > >> user 33.27 > >> sys 1.19 > >> > >> VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort { > >> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ; > >> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print > >> "${.sh.version}" ; nanosort <xxx >yyy' > >> Version AIJMP 93v- 2013-10-08 > >> real 15.34 > >> user 14.67 > >> sys 0.52 > >> > >> So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is > >> correct. > >> > >> What does VMALLOC_OPTIONS=getmem=safe do? > > > > > > vmalloc has an internal discipline/method for getting memory from the > system > > several methods are available with varying degrees of thread safety etc. > > see src/lib/libast/vmalloc/vmdcsystem.c for the code > > and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS > > description (vmalloc.3 update shortly) > > > > ** getmemory=f enable f[,g] getmemory() functions if supported, > all > > by default > > ** anon: mmap(MAP_ANON) > > ** break|sbrk: sbrk() > > ** native: native malloc() > > ** safe: safe sbrk() emulation via > mmap(MAP_ANON) > > ** zero: mmap(/dev/zero) > > > > i believe the performance regression with "anon" is that on linux > > mmap(0....MAP_ANON|MAP_PRIVATE...), > > which lets the system decide the address, returns adjacent (when > possible) > > region addresses from highest to lowest order > > and the reverse order at minimum tends to fragment more memory > > "zero" has the same hi=>lo characteristic > > i suspect it adversely affects the vmalloc coalescing algorithm but have > not > > dug deeper > > for now the probe order in vmalloc/vmdcsystem.c was simply changed to > favor > > "safe" > > Erm... since Irek prodded me by phone I looked at the issue... > ... some observations first (on Solaris 11/Illumos): > > 1. /dev/zero allocator vs. |sbrk()| allocator on Solaris: > -- snip -- > $ VMALLOC_OPTIONS=getmem=zero timex ~/bin/ksh -c 'function nanosort { > typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ; > a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print > "${.sh.version}" ; nanosort <xxx >yyy' > Version AIJMP 93v- 2013-10-08 > > real 32.98 > user 32.55 > sys 0.32 > > $ VMALLOC_OPTIONS=getmem=break timex ~/bin/ksh -c 'function nanosort { > typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ; > a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print > "${.sh.version}" ; nanosort <xxx >yyy' > Version AIJMP 93v- 2013-10-08 > > real 1:08.41 > user 1:07.87 > sys 0.38 > -- snip -- > ... which means the |sbrk()| allocator is twice a slow as the > /dev/zero allocator. > sbrk is different from safebreak -- look at the vmdcsystem.c code the alpha will default to not probe just mapped pages for overbooking this will result in spurious and for the most part untraceable core dumps on systems running out of memory > 2. The default block size by the normal |mmap(MAP_ANON)| allocator is > 1MB. This is IMHO far to small because there is IMO not enough space > for the coalescing algorithm to operate and a *lot* of fragmentation > occurs. > IMHO a _minimum_ page size of 4MB should be picked (as a side-effect > the shell would get 4MB or 2MB largepages on platforms like Solaris > automagically). > default block size upped to 4Mi and pagesize=<n>[KMGP][i] can ovveride in VMALLOC_OPTIONS for testing > > 3. After each |mmap(MAP_ANON)| allocation the libast allocator > "manually" clears the obtained memory chunk with zero bytes. This is > IMO a *major* source of wasting CPU time (>= ~30%-38% of a > |_ast_malloc(1024*1024)|) because each memory page is instantiated by > writing zeros to it. If the clearing could be avoided (which is > unneccesary anyway) we'd easily win ~30%-38% and do *not* instantiate > pages which we do not use yet. > can you pinpoint the code that does this -- the only memset(0) i see are due to explicit VM_RSZERO Just to make it clear: Allocating a 1MB chunk of memory via > |mmap(MAP_ANON)| and a 128MB chunk of memory via |mmap(MAP_ANON)| has > *no* (visible) difference in performance until we touch the pages via > either read/execute or write accesses. > Currently the libast allocator code writes zeros into the whole chunk > of memory obtained via |mmap(MAP_ANON)| which pretty much ruins > performance because *all* pages are created physically instead of just > being some memory marked as "reserved". If libast would stop writing > into memory chunks directly after the |mmap(MAP_ANON)| we could easily > bump the allocation size up to 32MB or better without any performance > penalty... > > ---- > > Bye, > Roland > > -- > __ . . __ > (o.\ \/ /.o) [email protected] > \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer > /O /==\ O\ TEL +49 641 3992797 > (;O/ \/ \O;) >
_______________________________________________ ast-users mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-users
