On Fri, Dec 6, 2013 at 12:41 AM, Roland Mainz <[email protected]> wrote:
> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler <[email protected]> wrote:
>> On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <[email protected]>
>> wrote:
>>>
>>> On 1 December 2013 17:26, Glenn Fowler <[email protected]> wrote:
[snip]
> 2. The default block size by the normal |mmap(MAP_ANON)| allocator is
> 1MB. This is IMHO far to small because there is IMO not enough space
> for the coalescing algorithm to operate and a *lot* of fragmentation
> occurs.
> IMHO a _minimum_ page size of 4MB should be picked (as a side-effect
> the shell would get 4MB or 2MB largepages on platforms like Solaris
> automagically).
>
> 3. After each |mmap(MAP_ANON)| allocation the libast allocator
> "manually" clears the obtained memory chunk with zero bytes. This is
> IMO a *major* source of wasting CPU time (>= ~30%-38% of a
> |_ast_malloc(1024*1024)|) because each memory page is instantiated by
> writing zeros to it. If the clearing could be avoided (which is
> unneccesary anyway) we'd easily win ~30%-38% and do *not* instantiate
> pages which we do not use yet.
>
> Just to make it clear: Allocating a 1MB chunk of memory via
> |mmap(MAP_ANON)| and a 128MB chunk of memory via |mmap(MAP_ANON)| has
> *no* (visible) difference in performance until we touch the pages via
> either read/execute or write accesses.
> Currently the libast allocator code writes zeros into the whole chunk
> of memory obtained via |mmap(MAP_ANON)| which pretty much ruins
> performance because *all* pages are created physically instead of just
> being some memory marked as "reserved". If libast would stop writing
> into memory chunks directly after the |mmap(MAP_ANON)| we could easily
> bump the allocation size up to 32MB or better without any performance
> penalty...
BTW: A quick fix for the original problem seems to be the following patch:
-- snip --
diff -r -u src/lib/libast/vmalloc/vmhdr.h src/lib/libast/vmalloc/vmhdr.h
--- src/lib/libast/vmalloc/vmhdr.h 2013-08-27 18:44:46.000000000 +0200
+++ src/lib/libast/vmalloc/vmhdr.h 2013-12-06 01:06:30.777622210 +0100
@@ -182,7 +182,7 @@
/* hint to regulate memory requests to discipline functions */
#if _ast_sizeof_size_t > 4 /* the address space is greater than 32-bit */
-#define VM_INCREMENT (1024*1024) /* lots of memory available here */
+#define VM_INCREMENT (32*1024*1024) /* lots of memory available here */
#else
#define VM_INCREMENT (64*1024) /* perhaps more limited memory */
#endif
-- snip --
It turns out that the issue is mainly fragmentation-related for the
"nanosort" testcase. After appying the patch above the runtime
*significantly* improves - even three seconds better than the old
ksh93 version:
-- snip --
$ VMALLOC_OPTIONS=getmem=anon timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 13.96
user 13.08
sys 0.44
-- snip --
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) [email protected]
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users