On May 11, 2015, at 1:19 PM, Mayank Kumar (mayankum) <mayan...@cisco.com> wrote:
> -our processes use setrlimit to limit virtual memory usage of processes. Do 
> you think jemalloc in someways could overshoot that limit or it might be 
> doing something funky which is not tracked through setrlimit(like not going 
> through brk/mmap/mremap).  Please excuse my limited understanding here.

jemalloc only uses mmap() and sbrk() to map memory on Unix-like systems.

> -someone pointed this link to me . 
> http://locklessinc.com/benchmarks_allocator.shtml
> It says the following stuff 
> 
> <quote>
> Jemalloc allocator
> 
> This is a very good allocator when there is a large amount of contention, 
> performing similarly to the Lockless memory allocator as the number of 
> threads grows larger than the number of processors. However, when the number 
> of allocating threads is smaller than the total number of cpus, it isn't 
> quite as fast. The disadvantage of the jemalloc allocator is its memory 
> usage. It uses power-of-two sized bins, which leads to a greatly increased 
> memory footprint compared to other allocators. This can affect real-world 
> performance due to excess cache and TLB misses.
> </quote>
> 
> Do you think it is still true, this might be an old link or just my limited 
> understanding. Off course they are selling here...., but justed wanted your 
> opinion here. For our case, though the allocating threads will be always 
> larger than number of cores.

The above was a combination of incorrect/incomplete information and 
microbenchmark-based overgeneralization even at the time it was written ~4 
years ago.  Specific issues:

- MP-scalable malloc implementations *avoid* contention in order to perform 
well.  The t-test1 microbenchmark as run did not induce appreciable contention 
in jemalloc.

- jemalloc's typically low memory usage has been a distinguishing quality since 
2006.  To claim otherwise based on one microbenchmark is unjustifiable.

- jemalloc has at various times used power-of-two-*spaced* bins for limited 
size ranges, e.g. 1024..2048..4096..8192 and 4MiB..8MiB, but it has never done 
so universally.  I suspect the author misread my BSDcan paper 
(http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf), and 
mistook the binary buddy page management system for size classes.  However, the 
binary buddy page management system was replaced long before jemalloc 1.0.0.

On the bright side, the benchmarks report actual performance results for a 
version of jemalloc, unlike a previous version of that page, which erroneously 
reported glibc results, or an interim update which categorically blamed a 
memalign() call with questionable alignment and the resulting crashes on 
jemalloc.

Note that the Lockless malloc implementation has since been open-sourced, so 
you can conduct your own tests and see how well it works for your use case.

Jason
_______________________________________________
jemalloc-discuss mailing list
jemalloc-discuss@canonware.com
http://www.canonware.com/mailman/listinfo/jemalloc-discuss

Reply via email to