[ast-developers] Re: Fwd: jemalloc

ольга крыжановская Wed, 15 Aug 2012 16:03:26 -0700

Phong, the mail was from Nicholas Clark <n...@ccl4.org>, which I
forwarded from the perl-porters list. I just found it interesting now
the new AST vmalloc compares to the competition.


Nicholas, can you answer Phong's question (below), please? You may
need to subscribe to
https://mailman.research.att.com/mailman/listinfo/ast-developers/
first.

Olga

On Thu, Aug 16, 2012 at 12:02 AM, Phong Vo <k...@research.att.com> wrote:
>
> I didn't know about Jemalloc but just downloaded it and ran with my sim code.
> This is just a single data point but Jemalloc seems slow and uses too much 
> memory.
>
> The parameters of the run was:
> -a10000000: 10,000,000 allocations.
> -t18: eighteen threads.
> -z10 and -Z1000: block sizes are random in the range [10,1000]
> -l100: each allocated block lives for a number of steps randomly picked from 
> [1,100],
>         then maybe realloced or freed.
> -e0.01: at each allocation step, there is a 1% chance that an emphemeral 
> thread is
>         created and run to allocate/free a small number of objects.
>
> The numbers to look at below are "elapse time" and "efficiency". Efficiency 
> is defined
> as the ratio between the total memory gotten from the OS (ie, "process") for 
> allocation
> divided by the maximum amount of busy memory (ie, "usage") at any given time.
>
> The "self" line reports user+sys times and numbers of context switches of the 
> main process.
> Vmalloc allows shared memory regions concurrently allocated from different 
> processes.
> If we had done that, there would be more reporting data for subprocesses. But 
> here we only
> test single process and multiple threads.
>
> ---------
> te:g:.../Vmalloc_t$ runsafe -a10000000 -t18 -z10 -Z1000 -l100 -e0.01
> t.malloc-vm
>         elapse time=6.0339s memory[process=390082560, usage=320892840, 
> efficiency=0.82]
>         self: time=20.5539s[user=10.3794s, sys=10.1745s], 
> csw=499145[voluntary=494822, forced=4323]
>
> t.malloc-je
>         elapse time=9.0062s memory[process=562839552, usage=320892840, 
> efficiency=0.57]
>         self: time=33.5089s[user=16.7435s, sys=16.7655s], 
> csw=789582[voluntary=783360, forced=6222]
> --------
>
> Vmalloc was faster than Jemalloc but in a real application that difference 
> might be minimal.
> I guess the alarming part in this test run was that the efficiency of 
> jemalloc was just 57%,
> meaning that it was taking roughly twice the amount of system memory to 
> manage the maximum
> busy memory that the process needed. Vmalloc's efficiency was 82%. The more 
> threads, the more
> fragmentation will occur but a good malloc needs to manage this, else for 
> large and long-running
> applications, things can turn out badly. Jemalloc may need more work there.
>
> Do you know if the test suite for Jemalloc is available publicly? What was 
> the meaning of the
> table of numbers in your email?
>
> Phong
>
>
>> From olga.kryzhanov...@gmail.com Wed Aug 15 16:39:25 2012
>> Subject: Fwd: jemalloc
>> To: ast-developers@research.att.com, Phong Vo <k...@research.att.com>
>
>> Phong, how does the new AST vmalloc compare to jemalloc?
>
>> Olga
>
>> ---------- Forwarded message ----------
>> From: Nicholas Clark <n...@ccl4.org>
>> Date: Wed, Aug 15, 2012 at 9:27 PM
>> Subject: jemalloc
>> To: perl5-port...@perl.org
>
>> Artur and Tim Bunce suggested investigating jemalloc, which is a high
>> performance malloc implementation now used by (among others) FreeBSD and
>> Facebook. Artur also suggest that our use of arenas of memory (for SV bodies)
>> is no longer the best idea, give that malloc() implementations have got
>> better. Fortunately arenas are easy to disable, by compiling with -DPURIFY.
>
>> So here is a comparison of blead (on dromedary, -Os, no threads), default,
>> compiled with -DPURIFY, default using an LD_PRELOAD to force the use of
>> jemalloc 3.0.0, and finally compiled with -DPURIFY and using jemalloc.
>
>> Not having anything fantastically better to hand, this is perlbench, with
>> each of the 4 run twice.
>
>> IIRC smaller numbers are better, and anything less than 5% is probably noise:
>
>>                          A     B     C     D     E     F     G     H
>>                        ---   ---   ---   ---   ---   ---   ---   ---
>> arith/mixed            100   101   101    98   102    98   101   101
>> arith/trig             100   101   101    99   100    98    99   100
>> array/copy             100   101    95   101   101   100   102   100
>> array/foreach          100    79   102    76   101    76   101    79
>> array/index            100   112   101   105   100   112   101   110
>> array/pop              100   103   100   100   102   102   102   102
>> array/shift            100   101    97    98   100   100   101   100
>> array/sort-num         100   103   100   103   100   103   100   102
>> array/sort             100    87    98    84   100    84    97    87
>> call/0arg              100   111   100   104   107   104   102   108
>> call/1arg              100    99   103    96   106    96   103    97
>> call/2arg              100   105    97    99    96   100    95   103
>> call/9arg              100   103    98   102   101    94    99   103
>> call/empty             100   102    99   102    99    97    96   103
>> call/fib               100   100   100    97    97   100   101   101
>> call/method            100   106   101   102    97   104   100   105
>> call/wantarray         100   109    98   101   100   102    98   110
>> hash/copy              100    85   102    81   101    78   104    88
>> hash/each              100    94   102    88    85    88   102    93
>> hash/foreach-sort      100    97    99    97   100    94   101    96
>> hash/foreach           100    96    98    95   103    93   102    94
>> hash/get               100   101    98   101   100   102   101   102
>> hash/set               100    96   102   102   100   101   102    91
>> loop/for-c             100   106   111   105   101   106   109   106
>> loop/for-range-const   100    99    99    97    96    97    94    98
>> loop/for-range         100   100   101    92    99    98    99    99
>> loop/getline           100   104    98   104   100   103   100   104
>> loop/while-my          100   103   101    99   100   101    99    99
>> loop/while             100    71   100    96    96    98   101    99
>> re/const               100    99    99    99   100    97    99    99
>> re/w                   100    99   100   101    98   100   101    97
>> startup/fewmod         100    98    99    97   100    96    98    98
>> startup/lotsofsub      100    98   100    98   100    98   100    98
>> startup/noprog         100   101    79    79   100    79    79   100
>> string/base64          100   100    99    99   100   100    98    99
>> string/htmlparser      100    98   108   105   100   105   107    98
>> string/index-const     100   100    98   100   100   101    99   101
>> string/index-var       100   100    98    99   100   100   100    99
>> string/ipol            100   108   107   107   108   106   108   106
>> string/tr              100   101   100   101    99   101   101   102
>
>> AVERAGE                100    99   100    98   100    98   100    99
>
>> ed2b02642a84b031        A                       A
>> +PURIFY                       B                                   B
>>         +jemalloc                   C                       C
>> +PURIFY +jemalloc                         D           D
>
>> It's not much, so I'm not sure if it's noise or "signal". If it's signal,
>> it's suggesting that glibc malloc is fractionally better than using arenas,
>> and jemalloc fractionally better still. But not much. (And that with arenas,
>> malloc doesn't seem to matter)
>
>> Would anyone like to pursue this further?
>
>> jemalloc is BSD licensed, actively maintained and likely to improve, so
>> potentially we could ship it as a replacement for the current malloc.c
>
>> However, I'm not sure how easy it would be to integrate. We're not in a
>> position to enforce the use of LD_PRELOAD to swap out the libc malloc, so
>> just like the current malloc.c we'd have to do a bit more to rename the
>> symbols, and to place nicely with the system malloc, particularly if both
>> use sbrk().
>
>> Nicholas Clark
>
>> --
>>       ,   _                                    _   ,
>>      { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
>> .----'-/`-/     olga.kryzhanov...@gmail.com   \-`\-'----.
>>  `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
>>       /\/\     Solaris/BSD//C/C++ programmer   /\/\
>>       `--`                                      `--`
>



-- 
      ,   _                                    _   ,
     { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
.----'-/`-/     olga.kryzhanov...@gmail.com   \-`\-'----.
 `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
      /\/\     Solaris/BSD//C/C++ programmer   /\/\
      `--`                                      `--`
_______________________________________________
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers

[ast-developers] Re: Fwd: jemalloc

Reply via email to