Phong, the mail was from Nicholas Clark <n...@ccl4.org>, which I forwarded from the perl-porters list. I just found it interesting now the new AST vmalloc compares to the competition.
Nicholas, can you answer Phong's question (below), please? You may need to subscribe to https://mailman.research.att.com/mailman/listinfo/ast-developers/ first. Olga On Thu, Aug 16, 2012 at 12:02 AM, Phong Vo <k...@research.att.com> wrote: > > I didn't know about Jemalloc but just downloaded it and ran with my sim code. > This is just a single data point but Jemalloc seems slow and uses too much > memory. > > The parameters of the run was: > -a10000000: 10,000,000 allocations. > -t18: eighteen threads. > -z10 and -Z1000: block sizes are random in the range [10,1000] > -l100: each allocated block lives for a number of steps randomly picked from > [1,100], > then maybe realloced or freed. > -e0.01: at each allocation step, there is a 1% chance that an emphemeral > thread is > created and run to allocate/free a small number of objects. > > The numbers to look at below are "elapse time" and "efficiency". Efficiency > is defined > as the ratio between the total memory gotten from the OS (ie, "process") for > allocation > divided by the maximum amount of busy memory (ie, "usage") at any given time. > > The "self" line reports user+sys times and numbers of context switches of the > main process. > Vmalloc allows shared memory regions concurrently allocated from different > processes. > If we had done that, there would be more reporting data for subprocesses. But > here we only > test single process and multiple threads. > > --------- > te:g:.../Vmalloc_t$ runsafe -a10000000 -t18 -z10 -Z1000 -l100 -e0.01 > t.malloc-vm > elapse time=6.0339s memory[process=390082560, usage=320892840, > efficiency=0.82] > self: time=20.5539s[user=10.3794s, sys=10.1745s], > csw=499145[voluntary=494822, forced=4323] > > t.malloc-je > elapse time=9.0062s memory[process=562839552, usage=320892840, > efficiency=0.57] > self: time=33.5089s[user=16.7435s, sys=16.7655s], > csw=789582[voluntary=783360, forced=6222] > -------- > > Vmalloc was faster than Jemalloc but in a real application that difference > might be minimal. > I guess the alarming part in this test run was that the efficiency of > jemalloc was just 57%, > meaning that it was taking roughly twice the amount of system memory to > manage the maximum > busy memory that the process needed. Vmalloc's efficiency was 82%. The more > threads, the more > fragmentation will occur but a good malloc needs to manage this, else for > large and long-running > applications, things can turn out badly. Jemalloc may need more work there. > > Do you know if the test suite for Jemalloc is available publicly? What was > the meaning of the > table of numbers in your email? > > Phong > > >> From olga.kryzhanov...@gmail.com Wed Aug 15 16:39:25 2012 >> Subject: Fwd: jemalloc >> To: ast-developers@research.att.com, Phong Vo <k...@research.att.com> > >> Phong, how does the new AST vmalloc compare to jemalloc? > >> Olga > >> ---------- Forwarded message ---------- >> From: Nicholas Clark <n...@ccl4.org> >> Date: Wed, Aug 15, 2012 at 9:27 PM >> Subject: jemalloc >> To: perl5-port...@perl.org > >> Artur and Tim Bunce suggested investigating jemalloc, which is a high >> performance malloc implementation now used by (among others) FreeBSD and >> Facebook. Artur also suggest that our use of arenas of memory (for SV bodies) >> is no longer the best idea, give that malloc() implementations have got >> better. Fortunately arenas are easy to disable, by compiling with -DPURIFY. > >> So here is a comparison of blead (on dromedary, -Os, no threads), default, >> compiled with -DPURIFY, default using an LD_PRELOAD to force the use of >> jemalloc 3.0.0, and finally compiled with -DPURIFY and using jemalloc. > >> Not having anything fantastically better to hand, this is perlbench, with >> each of the 4 run twice. > >> IIRC smaller numbers are better, and anything less than 5% is probably noise: > >> A B C D E F G H >> --- --- --- --- --- --- --- --- >> arith/mixed 100 101 101 98 102 98 101 101 >> arith/trig 100 101 101 99 100 98 99 100 >> array/copy 100 101 95 101 101 100 102 100 >> array/foreach 100 79 102 76 101 76 101 79 >> array/index 100 112 101 105 100 112 101 110 >> array/pop 100 103 100 100 102 102 102 102 >> array/shift 100 101 97 98 100 100 101 100 >> array/sort-num 100 103 100 103 100 103 100 102 >> array/sort 100 87 98 84 100 84 97 87 >> call/0arg 100 111 100 104 107 104 102 108 >> call/1arg 100 99 103 96 106 96 103 97 >> call/2arg 100 105 97 99 96 100 95 103 >> call/9arg 100 103 98 102 101 94 99 103 >> call/empty 100 102 99 102 99 97 96 103 >> call/fib 100 100 100 97 97 100 101 101 >> call/method 100 106 101 102 97 104 100 105 >> call/wantarray 100 109 98 101 100 102 98 110 >> hash/copy 100 85 102 81 101 78 104 88 >> hash/each 100 94 102 88 85 88 102 93 >> hash/foreach-sort 100 97 99 97 100 94 101 96 >> hash/foreach 100 96 98 95 103 93 102 94 >> hash/get 100 101 98 101 100 102 101 102 >> hash/set 100 96 102 102 100 101 102 91 >> loop/for-c 100 106 111 105 101 106 109 106 >> loop/for-range-const 100 99 99 97 96 97 94 98 >> loop/for-range 100 100 101 92 99 98 99 99 >> loop/getline 100 104 98 104 100 103 100 104 >> loop/while-my 100 103 101 99 100 101 99 99 >> loop/while 100 71 100 96 96 98 101 99 >> re/const 100 99 99 99 100 97 99 99 >> re/w 100 99 100 101 98 100 101 97 >> startup/fewmod 100 98 99 97 100 96 98 98 >> startup/lotsofsub 100 98 100 98 100 98 100 98 >> startup/noprog 100 101 79 79 100 79 79 100 >> string/base64 100 100 99 99 100 100 98 99 >> string/htmlparser 100 98 108 105 100 105 107 98 >> string/index-const 100 100 98 100 100 101 99 101 >> string/index-var 100 100 98 99 100 100 100 99 >> string/ipol 100 108 107 107 108 106 108 106 >> string/tr 100 101 100 101 99 101 101 102 > >> AVERAGE 100 99 100 98 100 98 100 99 > >> ed2b02642a84b031 A A >> +PURIFY B B >> +jemalloc C C >> +PURIFY +jemalloc D D > >> It's not much, so I'm not sure if it's noise or "signal". If it's signal, >> it's suggesting that glibc malloc is fractionally better than using arenas, >> and jemalloc fractionally better still. But not much. (And that with arenas, >> malloc doesn't seem to matter) > >> Would anyone like to pursue this further? > >> jemalloc is BSD licensed, actively maintained and likely to improve, so >> potentially we could ship it as a replacement for the current malloc.c > >> However, I'm not sure how easy it would be to integrate. We're not in a >> position to enforce the use of LD_PRELOAD to swap out the libc malloc, so >> just like the current malloc.c we'd have to do a bit more to rename the >> symbols, and to place nicely with the system malloc, particularly if both >> use sbrk(). > >> Nicholas Clark > >> -- >> , _ _ , >> { \/`o;====- Olga Kryzhanovska -====;o`\/ } >> .----'-/`-/ olga.kryzhanov...@gmail.com \-`\-'----. >> `'-..-| / http://twitter.com/fleyta \ |-..-'` >> /\/\ Solaris/BSD//C/C++ programmer /\/\ >> `--` `--` > -- , _ _ , { \/`o;====- Olga Kryzhanovska -====;o`\/ } .----'-/`-/ olga.kryzhanov...@gmail.com \-`\-'----. `'-..-| / http://twitter.com/fleyta \ |-..-'` /\/\ Solaris/BSD//C/C++ programmer /\/\ `--` `--` _______________________________________________ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers