On Fri, Dec 6, 2013 at 12:41 AM, Roland Mainz <roland.ma...@nrubsig.org> wrote: > On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler <glenn.s.fow...@gmail.com> wrote: >> On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <lionelcons1...@gmail.com> >> wrote: >>> >>> On 1 December 2013 17:26, Glenn Fowler <glenn.s.fow...@gmail.com> wrote: [snip] > 2. The default block size by the normal |mmap(MAP_ANON)| allocator is > 1MB. This is IMHO far to small because there is IMO not enough space > for the coalescing algorithm to operate and a *lot* of fragmentation > occurs. > IMHO a _minimum_ page size of 4MB should be picked (as a side-effect > the shell would get 4MB or 2MB largepages on platforms like Solaris > automagically). > > 3. After each |mmap(MAP_ANON)| allocation the libast allocator > "manually" clears the obtained memory chunk with zero bytes. This is > IMO a *major* source of wasting CPU time (>= ~30%-38% of a > |_ast_malloc(1024*1024)|) because each memory page is instantiated by > writing zeros to it. If the clearing could be avoided (which is > unneccesary anyway) we'd easily win ~30%-38% and do *not* instantiate > pages which we do not use yet. > > Just to make it clear: Allocating a 1MB chunk of memory via > |mmap(MAP_ANON)| and a 128MB chunk of memory via |mmap(MAP_ANON)| has > *no* (visible) difference in performance until we touch the pages via > either read/execute or write accesses. > Currently the libast allocator code writes zeros into the whole chunk > of memory obtained via |mmap(MAP_ANON)| which pretty much ruins > performance because *all* pages are created physically instead of just > being some memory marked as "reserved". If libast would stop writing > into memory chunks directly after the |mmap(MAP_ANON)| we could easily > bump the allocation size up to 32MB or better without any performance > penalty...
BTW: A quick fix for the original problem seems to be the following patch: -- snip -- diff -r -u src/lib/libast/vmalloc/vmhdr.h src/lib/libast/vmalloc/vmhdr.h --- src/lib/libast/vmalloc/vmhdr.h 2013-08-27 18:44:46.000000000 +0200 +++ src/lib/libast/vmalloc/vmhdr.h 2013-12-06 01:06:30.777622210 +0100 @@ -182,7 +182,7 @@ /* hint to regulate memory requests to discipline functions */ #if _ast_sizeof_size_t > 4 /* the address space is greater than 32-bit */ -#define VM_INCREMENT (1024*1024) /* lots of memory available here */ +#define VM_INCREMENT (32*1024*1024) /* lots of memory available here */ #else #define VM_INCREMENT (64*1024) /* perhaps more limited memory */ #endif -- snip -- It turns out that the issue is mainly fragmentation-related for the "nanosort" testcase. After appying the patch above the runtime *significantly* improves - even three seconds better than the old ksh93 version: -- snip -- $ VMALLOC_OPTIONS=getmem=anon timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort <xxx >yyy' Version AIJMP 93v- 2013-10-08 real 13.96 user 13.08 sys 0.44 -- snip -- ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.ma...@nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) _______________________________________________ ast-users mailing list ast-users@lists.research.att.com http://lists.research.att.com/mailman/listinfo/ast-users