On Thu, Nov 25, 2021 at 04:38:27PM +0500, ???? ??????? wrote:
> > Thus I think that instead of focusing on the OS we ought to continue
> > to focus on the allocator and improve runtime detection:
> >
> >   - glibc (currently detected using detect_allocator)
> >     => use malloc_trim()
> >   - jemalloc at build time (mallctl != NULL)
> >     => use mallctl() as you did
> >   - jemalloc at runtime (mallctl == NULL but dlsym("mallctl") != NULL)
> >     => use mallctl() as you did
> >   - others
> >     => no trimming
> >
> 
> I never imagined earlier that high level applications (such as reverse
> https/tcp proxy) cares about such low level things as allocator behaviour.
> no jokes, really.

Yes it does count a lot. That's also why we spent a lot of time optimizing
the pools, to limit the number of calls to the system's allocator for
everything that uses a fixed size. I've seen some performance graphs in
our internal ticket tracker showing the memory consumption between and
after the switch to jemalloc, and the CPU usage as well, and sometimes
it was very important.

Glibc improved quite a bit recently (2.28 or 2.33 I don't remember) by
implementing a per-thread cache in its ptmalloc. But in our case it's
still not as good as jemalloc, and neither perform as well as our
thread-local pools for fixed sizes.

I'm seeing in a paper about snmalloc that it performs exceptionally well
for small allocations. I just don't know how this degrades depending on
the access patterns. For example some allocators are fast when you free()
in the exact reverse allocation order, but can start to fragment or have
more work to do finding holes if you don't free() in the exact same order.

But that's something to keep an eye on in the future.

Willy

Reply via email to