Vmalloc disciplines do not zero out memory. The only explicit zeroing of
memory occurs in the call vmresize() and only with the flag VM_RSZERO or in
the malloc-compatible calloc() call.
Phong


On Fri, Dec 6, 2013 at 10:18 AM, Glenn Fowler <[email protected]>wrote:

> On Thu, Dec 5, 2013 at 6:41 PM, Roland Mainz <[email protected]>wrote:
>
>> On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler <[email protected]>
>> wrote:
>> > On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <[email protected]>
>> > wrote:
>> >>
>> >> On 1 December 2013 17:26, Glenn Fowler <[email protected]>
>> wrote:
>> >> > I believe this is related to vmalloc changes between 2013-05-31 and
>> >> > 2013-06-09
>> >> > re-run the tests with
>> >> > export VMALLOC_OPTIONS=getmem=safe
>> >> > if that's the problem then it gives a clue on a general solution
>> >> > details after confirmation
>> >> >
>> >>
>> >> timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
>> >> while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
>> >> "%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort <xxx >yyy'
>> >> Version AIJMP 93v- 2013-10-08
>> >>
>> >> real          34.60
>> >> user          33.27
>> >> sys            1.19
>> >>
>> >> VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
>> >> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
>> >> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
>> >> "${.sh.version}" ; nanosort <xxx >yyy'
>> >> Version AIJMP 93v- 2013-10-08
>> >> real          15.34
>> >> user          14.67
>> >> sys            0.52
>> >>
>> >> So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is
>> >> correct.
>> >>
>> >> What does VMALLOC_OPTIONS=getmem=safe do?
>> >
>> >
>> > vmalloc has an internal discipline/method for getting memory from the
>> system
>> > several methods are available with varying degrees of thread safety etc.
>> > see src/lib/libast/vmalloc/vmdcsystem.c for the code
>> > and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
>> > description (vmalloc.3 update shortly)
>> >
>> > **          getmemory=f enable f[,g] getmemory() functions if
>> supported, all
>> > by default
>> > **                          anon: mmap(MAP_ANON)
>> > **                          break|sbrk: sbrk()
>> > **                          native: native malloc()
>> > **                          safe: safe sbrk() emulation via
>> mmap(MAP_ANON)
>> > **                          zero: mmap(/dev/zero)
>> >
>> > i believe the performance regression with "anon" is that on linux
>> > mmap(0....MAP_ANON|MAP_PRIVATE...),
>> > which lets the system decide the address, returns adjacent (when
>> possible)
>> > region addresses from highest to lowest order
>> > and the reverse order at minimum tends to fragment more memory
>> > "zero" has the same hi=>lo characteristic
>> > i suspect it adversely affects the vmalloc coalescing algorithm but
>> have not
>> > dug deeper
>> > for now the probe order in vmalloc/vmdcsystem.c was simply changed to
>> favor
>> > "safe"
>>
>> Erm... since Irek prodded me by phone I looked at the issue...
>> ... some observations first (on Solaris 11/Illumos):
>>
>> 1. /dev/zero allocator vs. |sbrk()| allocator on Solaris:
>> -- snip --
>> $ VMALLOC_OPTIONS=getmem=zero timex ~/bin/ksh -c 'function nanosort {
>> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
>> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
>> "${.sh.version}" ; nanosort <xxx >yyy'
>> Version AIJMP 93v- 2013-10-08
>>
>> real          32.98
>> user          32.55
>> sys            0.32
>>
>> $ VMALLOC_OPTIONS=getmem=break timex ~/bin/ksh -c 'function nanosort {
>> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
>> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
>> "${.sh.version}" ; nanosort <xxx >yyy'
>> Version AIJMP 93v- 2013-10-08
>>
>> real        1:08.41
>> user        1:07.87
>> sys            0.38
>> -- snip --
>> ... which means the |sbrk()| allocator is twice a slow as the
>> /dev/zero allocator.
>>
>
> sbrk is different from safebreak -- look at the vmdcsystem.c code
> the alpha will default to not probe just mapped pages for overbooking
> this will result in spurious and for the most part untraceable core dumps
> on systems running out of memory
>
>
>> 2. The default block size by the normal |mmap(MAP_ANON)| allocator is
>> 1MB. This is IMHO far to small because there is IMO not enough space
>> for the coalescing algorithm to operate and a *lot* of fragmentation
>> occurs.
>> IMHO a _minimum_ page size of 4MB should be picked (as a side-effect
>> the shell would get 4MB or 2MB largepages on platforms like Solaris
>> automagically).
>>
>
> default block size upped to 4Mi and pagesize=<n>[KMGP][i] can ovveride in
> VMALLOC_OPTIONS for testing
>
>>
>> 3. After each |mmap(MAP_ANON)| allocation the libast allocator
>> "manually" clears the obtained memory chunk with zero bytes. This is
>> IMO a *major* source of wasting CPU time (>= ~30%-38% of a
>> |_ast_malloc(1024*1024)|) because each memory page is instantiated by
>> writing zeros to it. If the clearing could be avoided (which is
>> unneccesary anyway) we'd easily win ~30%-38% and do *not* instantiate
>> pages which we do not use yet.
>>
>
> can you pinpoint the code that does this -- the only memset(0) i see are
> due to explicit VM_RSZERO
>
> Just to make it clear: Allocating a 1MB chunk of memory via
>> |mmap(MAP_ANON)| and a 128MB chunk of memory via |mmap(MAP_ANON)| has
>> *no* (visible) difference in performance until we touch the pages via
>> either read/execute or write accesses.
>> Currently the libast allocator code writes zeros into the whole chunk
>> of memory obtained via |mmap(MAP_ANON)| which pretty much ruins
>> performance because *all* pages are created physically instead of just
>> being some memory marked as "reserved". If libast would stop writing
>> into memory chunks directly after the |mmap(MAP_ANON)| we could easily
>> bump the allocation size up to 32MB or better without any performance
>> penalty...
>>
>> ----
>>
>> Bye,
>> Roland
>>
>> --
>>   __ .  . __
>>  (o.\ \/ /.o) [email protected]
>>   \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
>>   /O /==\ O\  TEL +49 641 3992797
>>  (;O/ \/ \O;)
>>
>
>
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users

Reply via email to