Le 06/08/2014 18:12, Jason Evans a écrit :
On Aug 6, 2014, at 8:55 AM, Guillaume Holley <ghol...@cebitec.uni-bielefeld.de>
wrote:
Le 06/08/2014 01:10, Jason Evans a écrit :
On Aug 5, 2014, at 10:35 AM, ghol...@cebitec.uni-bielefeld.de wrote:
I’m currently working on a data structure allowing the storage of a
dynamic set of short DNA sequences plus annotations.
Here are few details : the data structure is written in C, tests are
currently run on Ubuntu 14.04 64 bits, everything is single threaded and
Valgrind indicates that the program which manipulates the data structure
has no memory leaks.
I’ve started to use Jemalloc in an attempt to reduce the fragmentation of
the memory (by using one arena, disabling the thread caching system and
using a high ratio of dirty pages). On small data sets (30 millions
insertions), results are very good in comparison of Glibc: about 150MB
less by using tuned Jemalloc.
Now, I’ve started tests with much bigger data sets (3 to 10 billions
insertions) and I realized that Jemalloc is using more memory than Glibc.
I have generated a data set of 200 millions entries which I tried to
insert in the data structure and when the memory used reached 1GB, I
stopped the program and reported the number of entries inserted.
When using Jemalloc, doesn’t matter the tuning parameters (1 or 4 arenas,
tcache activated or not, lg_dirty = 3 or 8 or 16, lg_chunk = 14 or 22 or
30), the number of entries inserted varies between 120 millions to 172
millions. Or by using the standard Glibc, I’m able to insert 187 millions
of entries.
And on billions of entries, Glibc (I don’t have precise numbers
unfortunately) uses few Gigabytes less than Jemalloc.
So I would like to know if there is an explanation for this and if I can
do something to make Jemalloc at least as efficient as Glibc is on my
tests ? Maybe I’m not using Jemalloc correctly ?
There are a few possible issues, mainly related to fragmentation, but I can't
make many specific guesses because I don't know what the
allocation/deallocation patterns are in your application. It sounds like your
application just does a bunch of allocation, with very little interspersed
deallocation, in which case I'm surprised by your results unless you happen to
be allocating lots of objects that are barely larger than the nearest size
class boundaries (e.g. 17 bytes). Have you taken a close look at the output of
malloc_stats_print()?
Jason
Hi and thank you for the help.
Well my application is doing like you said a lot of allocations with very
little deallocations, but the memory allocated is very often reallocated.
However, my application cannot allocate memory for more than 600KB in one
allocation, so no allocation of huge objects.
I tried to have a look at malloc_stats_print() (which is enclosed at the end of
my answer) and I see that for bins of size 64, it seems I make a huge amount of
allocations/reallocations for a small amount of memory allocated, and maybe
this generates a lot of fragmentation but I don't know in which proportion. Do
you think my problem could be linked to this, and if yes, can I do something on
Jemalloc to solve it ?
Here are the Jemalloc statistics for 200 millions insertions, Jemalloc tuned with the
following parameters : "narenas:1,tcache:false,lg_dirty_mult:8"
___ Begin jemalloc statistics ___
[...]
Allocated: 1196728936, active: 1212567552, mapped: 1287651328
The overall external fragmentation is 1-(allocated/active), 1.3%, which is very
low.
Current active ceiling: 16416505856
chunks: nchunks highchunks curchunks
307 307 307
huge: nmalloc ndalloc allocated
0 0 0
arenas[0]:
assigned threads: 1
dss allocation precedence: disabled
dirty pages: 296037:66 active:dirty, 25210 sweeps, 90046 madvises, 627169 purged
allocated nmalloc ndalloc nrequests
small: 1116037736 372536523 370598419 372536523
large: 80691200 223900 220617 223900
total: 1196728936 372760423 370819036 372760423
active: 1212567552
mapped: 1283457024
bins: bin size regs pgs allocated nmalloc ndalloc nrequests
nfills nflushes newruns reruns curruns
[...]
21 1280 51 16 600357120 641800 172771 641800
0 0 9221 184785 9197
The only aspect of jemalloc that might be causing you problems is size class
rounding. IIRC glibc's malloc spaces its small size classes closer together.
If your application allocates lots of 1025-byte objects, that could cost you
nearly 20% in terms of memory usage for those allocations (~120 MB in this
case).
Jason
I will try to see if I can solve that, thank you for the help !
Guillaume
_______________________________________________
jemalloc-discuss mailing list
jemalloc-discuss@canonware.com
http://www.canonware.com/mailman/listinfo/jemalloc-discuss