On Fri, Dec 6, 2013 at 12:41 AM, Roland Mainz <roland.ma...@nrubsig.org> wrote:
> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler <glenn.s.fow...@gmail.com> wrote:
>> On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <lionelcons1...@gmail.com>
>> wrote:
>>>
>>> On 1 December 2013 17:26, Glenn Fowler <glenn.s.fow...@gmail.com> wrote:
[snip]
> 2. The default block size by the normal |mmap(MAP_ANON)| allocator is
> 1MB. This is IMHO far to small because there is IMO not enough space
> for the coalescing algorithm to operate and a *lot* of fragmentation
> occurs.
> IMHO a _minimum_ page size of 4MB should be picked (as a side-effect
> the shell would get 4MB or 2MB largepages on platforms like Solaris
> automagically).
>
> 3. After each |mmap(MAP_ANON)| allocation the libast allocator
> "manually" clears the obtained memory chunk with zero bytes. This is
> IMO a *major* source of wasting CPU time (>= ~30%-38% of a
> |_ast_malloc(1024*1024)|) because each memory page is instantiated by
> writing zeros to it. If the clearing could be avoided (which is
> unneccesary anyway) we'd easily win ~30%-38% and do *not* instantiate
> pages which we do not use yet.
>
> Just to make it clear: Allocating a 1MB chunk of memory via
> |mmap(MAP_ANON)| and a 128MB chunk of memory via |mmap(MAP_ANON)| has
> *no* (visible) difference in performance until we touch the pages via
> either read/execute or write accesses.
> Currently the libast allocator code writes zeros into the whole chunk
> of memory obtained via |mmap(MAP_ANON)| which pretty much ruins
> performance because *all* pages are created physically instead of just
> being some memory marked as "reserved". If libast would stop writing
> into memory chunks directly after the |mmap(MAP_ANON)| we could easily
> bump the allocation size up to 32MB or better without any performance
> penalty...

BTW: A quick fix for the original problem seems to be the following patch:
-- snip --
diff -r -u src/lib/libast/vmalloc/vmhdr.h src/lib/libast/vmalloc/vmhdr.h
--- src/lib/libast/vmalloc/vmhdr.h 2013-08-27 18:44:46.000000000 +0200
+++ src/lib/libast/vmalloc/vmhdr.h       2013-12-06 01:06:30.777622210 +0100
@@ -182,7 +182,7 @@

 /* hint to regulate memory requests to discipline functions */
 #if _ast_sizeof_size_t > 4 /* the address space is greater than 32-bit */
-#define VM_INCREMENT   (1024*1024) /* lots of memory available here    */
+#define VM_INCREMENT   (32*1024*1024) /* lots of memory available here */
 #else
 #define VM_INCREMENT   (64*1024)  /* perhaps more limited memory       */
 #endif
-- snip --

It turns out that the issue is mainly fragmentation-related for the
"nanosort" testcase. After appying the patch above the runtime
*significantly* improves - even three seconds better than the old
ksh93 version:
-- snip --
$ VMALLOC_OPTIONS=getmem=anon timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08

real          13.96
user          13.08
sys            0.44
-- snip --

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.ma...@nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
_______________________________________________
ast-users mailing list
ast-users@lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-users

Reply via email to