I didn't know about Jemalloc but just downloaded it and ran with my sim code.
This is just a single data point but Jemalloc seems slow and uses too much 
memory.

The parameters of the run was:
-a10000000: 10,000,000 allocations.
-t18: eighteen threads.
-z10 and -Z1000: block sizes are random in the range [10,1000]
-l100: each allocated block lives for a number of steps randomly picked from 
[1,100],
        then maybe realloced or freed.
-e0.01: at each allocation step, there is a 1% chance that an emphemeral thread 
is
        created and run to allocate/free a small number of objects.

The numbers to look at below are "elapse time" and "efficiency". Efficiency is 
defined
as the ratio between the total memory gotten from the OS (ie, "process") for 
allocation
divided by the maximum amount of busy memory (ie, "usage") at any given time.

The "self" line reports user+sys times and numbers of context switches of the 
main process.
Vmalloc allows shared memory regions concurrently allocated from different 
processes.
If we had done that, there would be more reporting data for subprocesses. But 
here we only
test single process and multiple threads.

---------
te:g:.../Vmalloc_t$ runsafe -a10000000 -t18 -z10 -Z1000 -l100 -e0.01
t.malloc-vm
        elapse time=6.0339s memory[process=390082560, usage=320892840, 
efficiency=0.82]
        self: time=20.5539s[user=10.3794s, sys=10.1745s], 
csw=499145[voluntary=494822, forced=4323]

t.malloc-je
        elapse time=9.0062s memory[process=562839552, usage=320892840, 
efficiency=0.57]
        self: time=33.5089s[user=16.7435s, sys=16.7655s], 
csw=789582[voluntary=783360, forced=6222]
--------

Vmalloc was faster than Jemalloc but in a real application that difference 
might be minimal.
I guess the alarming part in this test run was that the efficiency of jemalloc 
was just 57%,
meaning that it was taking roughly twice the amount of system memory to manage 
the maximum
busy memory that the process needed. Vmalloc's efficiency was 82%. The more 
threads, the more
fragmentation will occur but a good malloc needs to manage this, else for large 
and long-running
applications, things can turn out badly. Jemalloc may need more work there.

Do you know if the test suite for Jemalloc is available publicly? What was the 
meaning of the
table of numbers in your email?

Phong


> From olga.kryzhanov...@gmail.com Wed Aug 15 16:39:25 2012
> Subject: Fwd: jemalloc
> To: ast-developers@research.att.com, Phong Vo <k...@research.att.com>

> Phong, how does the new AST vmalloc compare to jemalloc?

> Olga

> ---------- Forwarded message ----------
> From: Nicholas Clark <n...@ccl4.org>
> Date: Wed, Aug 15, 2012 at 9:27 PM
> Subject: jemalloc
> To: perl5-port...@perl.org

> Artur and Tim Bunce suggested investigating jemalloc, which is a high
> performance malloc implementation now used by (among others) FreeBSD and
> Facebook. Artur also suggest that our use of arenas of memory (for SV bodies)
> is no longer the best idea, give that malloc() implementations have got
> better. Fortunately arenas are easy to disable, by compiling with -DPURIFY.

> So here is a comparison of blead (on dromedary, -Os, no threads), default,
> compiled with -DPURIFY, default using an LD_PRELOAD to force the use of
> jemalloc 3.0.0, and finally compiled with -DPURIFY and using jemalloc.

> Not having anything fantastically better to hand, this is perlbench, with
> each of the 4 run twice.

> IIRC smaller numbers are better, and anything less than 5% is probably noise:

>                          A     B     C     D     E     F     G     H
>                        ---   ---   ---   ---   ---   ---   ---   ---
> arith/mixed            100   101   101    98   102    98   101   101
> arith/trig             100   101   101    99   100    98    99   100
> array/copy             100   101    95   101   101   100   102   100
> array/foreach          100    79   102    76   101    76   101    79
> array/index            100   112   101   105   100   112   101   110
> array/pop              100   103   100   100   102   102   102   102
> array/shift            100   101    97    98   100   100   101   100
> array/sort-num         100   103   100   103   100   103   100   102
> array/sort             100    87    98    84   100    84    97    87
> call/0arg              100   111   100   104   107   104   102   108
> call/1arg              100    99   103    96   106    96   103    97
> call/2arg              100   105    97    99    96   100    95   103
> call/9arg              100   103    98   102   101    94    99   103
> call/empty             100   102    99   102    99    97    96   103
> call/fib               100   100   100    97    97   100   101   101
> call/method            100   106   101   102    97   104   100   105
> call/wantarray         100   109    98   101   100   102    98   110
> hash/copy              100    85   102    81   101    78   104    88
> hash/each              100    94   102    88    85    88   102    93
> hash/foreach-sort      100    97    99    97   100    94   101    96
> hash/foreach           100    96    98    95   103    93   102    94
> hash/get               100   101    98   101   100   102   101   102
> hash/set               100    96   102   102   100   101   102    91
> loop/for-c             100   106   111   105   101   106   109   106
> loop/for-range-const   100    99    99    97    96    97    94    98
> loop/for-range         100   100   101    92    99    98    99    99
> loop/getline           100   104    98   104   100   103   100   104
> loop/while-my          100   103   101    99   100   101    99    99
> loop/while             100    71   100    96    96    98   101    99
> re/const               100    99    99    99   100    97    99    99
> re/w                   100    99   100   101    98   100   101    97
> startup/fewmod         100    98    99    97   100    96    98    98
> startup/lotsofsub      100    98   100    98   100    98   100    98
> startup/noprog         100   101    79    79   100    79    79   100
> string/base64          100   100    99    99   100   100    98    99
> string/htmlparser      100    98   108   105   100   105   107    98
> string/index-const     100   100    98   100   100   101    99   101
> string/index-var       100   100    98    99   100   100   100    99
> string/ipol            100   108   107   107   108   106   108   106
> string/tr              100   101   100   101    99   101   101   102

> AVERAGE                100    99   100    98   100    98   100    99

> ed2b02642a84b031        A                       A
> +PURIFY                       B                                   B
>         +jemalloc                   C                       C
> +PURIFY +jemalloc                         D           D

> It's not much, so I'm not sure if it's noise or "signal". If it's signal,
> it's suggesting that glibc malloc is fractionally better than using arenas,
> and jemalloc fractionally better still. But not much. (And that with arenas,
> malloc doesn't seem to matter)

> Would anyone like to pursue this further?

> jemalloc is BSD licensed, actively maintained and likely to improve, so
> potentially we could ship it as a replacement for the current malloc.c

> However, I'm not sure how easy it would be to integrate. We're not in a
> position to enforce the use of LD_PRELOAD to swap out the libc malloc, so
> just like the current malloc.c we'd have to do a bit more to rename the
> symbols, and to place nicely with the system malloc, particularly if both
> use sbrk().

> Nicholas Clark

> -- 
>       ,   _                                    _   ,
>      { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
> .----'-/`-/     olga.kryzhanov...@gmail.com   \-`\-'----.
>  `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
>       /\/\     Solaris/BSD//C/C++ programmer   /\/\
>       `--`                                      `--`

_______________________________________________
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to