On Wed, Dec 4, 2013 at 3:02 PM, Glenn Fowler <[email protected]> wrote:
> On Sun, Dec 1, 2013 at 4:58 PM, Lionel Cons <[email protected]>
> wrote:
>>
>> On 1 December 2013 17:26, Glenn Fowler <[email protected]> wrote:
>> > I believe this is related to vmalloc changes between 2013-05-31 and
>> > 2013-06-09
>> > re-run the tests with
>> > export VMALLOC_OPTIONS=getmem=safe
>> > if that's the problem then it gives a clue on a general solution
>> > details after confirmation
>> >
>>
>> timex ~/bin/ksh -c 'function nanosort { typeset -A a ; integer k=0;
>> while read i ; do key="$i$((k++))" ; a["$key"]="$i" ; done ; printf
>> "%s\n" "${a[@]}" ; } ; print "${.sh.version}" ; nanosort <xxx >yyy'
>> Version AIJMP 93v- 2013-10-08
>>
>> real 34.60
>> user 33.27
>> sys 1.19
>>
>> VMALLOC_OPTIONS=getmem=safe timex ~/bin/ksh -c 'function nanosort {
>> typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
>> a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
>> "${.sh.version}" ; nanosort <xxx >yyy'
>> Version AIJMP 93v- 2013-10-08
>> real 15.34
>> user 14.67
>> sys 0.52
>>
>> So your hunch that VMALLOC_OPTIONS=getmem=safe fixes the problem is
>> correct.
>>
>> What does VMALLOC_OPTIONS=getmem=safe do?
>
>
> vmalloc has an internal discipline/method for getting memory from the system
> several methods are available with varying degrees of thread safety etc.
> see src/lib/libast/vmalloc/vmdcsystem.c for the code
> and src/lib/libast/vmalloc/malloc.c for the latest VMALLOC_OPTIONS
> description (vmalloc.3 update shortly)
>
> ** getmemory=f enable f[,g] getmemory() functions if supported, all
> by default
> ** anon: mmap(MAP_ANON)
> ** break|sbrk: sbrk()
> ** native: native malloc()
> ** safe: safe sbrk() emulation via mmap(MAP_ANON)
> ** zero: mmap(/dev/zero)
>
> i believe the performance regression with "anon" is that on linux
> mmap(0....MAP_ANON|MAP_PRIVATE...),
> which lets the system decide the address, returns adjacent (when possible)
> region addresses from highest to lowest order
> and the reverse order at minimum tends to fragment more memory
> "zero" has the same hi=>lo characteristic
> i suspect it adversely affects the vmalloc coalescing algorithm but have not
> dug deeper
> for now the probe order in vmalloc/vmdcsystem.c was simply changed to favor
> "safe"
Erm... since Irek prodded me by phone I looked at the issue...
... some observations first (on Solaris 11/Illumos):
1. /dev/zero allocator vs. |sbrk()| allocator on Solaris:
-- snip --
$ VMALLOC_OPTIONS=getmem=zero timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 32.98
user 32.55
sys 0.32
$ VMALLOC_OPTIONS=getmem=break timex ~/bin/ksh -c 'function nanosort {
typeset -A a ; integer k=0; while read i ; do key="$i$((k++))" ;
a["$key"]="$i" ; done ; printf "%s\n" "${a[@]}" ; } ; print
"${.sh.version}" ; nanosort <xxx >yyy'
Version AIJMP 93v- 2013-10-08
real 1:08.41
user 1:07.87
sys 0.38
-- snip --
... which means the |sbrk()| allocator is twice a slow as the
/dev/zero allocator.
2. The default block size by the normal |mmap(MAP_ANON)| allocator is
1MB. This is IMHO far to small because there is IMO not enough space
for the coalescing algorithm to operate and a *lot* of fragmentation
occurs.
IMHO a _minimum_ page size of 4MB should be picked (as a side-effect
the shell would get 4MB or 2MB largepages on platforms like Solaris
automagically).
3. After each |mmap(MAP_ANON)| allocation the libast allocator
"manually" clears the obtained memory chunk with zero bytes. This is
IMO a *major* source of wasting CPU time (>= ~30%-38% of a
|_ast_malloc(1024*1024)|) because each memory page is instantiated by
writing zeros to it. If the clearing could be avoided (which is
unneccesary anyway) we'd easily win ~30%-38% and do *not* instantiate
pages which we do not use yet.
Just to make it clear: Allocating a 1MB chunk of memory via
|mmap(MAP_ANON)| and a 128MB chunk of memory via |mmap(MAP_ANON)| has
*no* (visible) difference in performance until we touch the pages via
either read/execute or write accesses.
Currently the libast allocator code writes zeros into the whole chunk
of memory obtained via |mmap(MAP_ANON)| which pretty much ruins
performance because *all* pages are created physically instead of just
being some memory marked as "reserved". If libast would stop writing
into memory chunks directly after the |mmap(MAP_ANON)| we could easily
bump the allocation size up to 32MB or better without any performance
penalty...
----
Bye,
Roland
--
__ . . __
(o.\ \/ /.o) [email protected]
\__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer
/O /==\ O\ TEL +49 641 3992797
(;O/ \/ \O;)
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users