Re: Fewer mallocs is better, episode #47

Timothe Litt via curl-library Tue, 01 Feb 2022 13:20:21 -0800


On 01-Feb-22 15:38, Thomas Dineen via curl-library wrote:

Gentle People:
I know that this is a bit off from the current thread but consider this:
"Few mallocs is good" Why?
One issue is that in many (all) Operating Systems malloc is a system call,
where a system call causes one or more context switches. Where each
context switch is considered high overhead ( read that time).
   Consider this: What if you create your own local malloc, which I will
call LMalloc(). LMalloc is initialized with a very large block via a single initial call to malloc, keeping in mind that this could be increased later,
with the goal of a very very few large calls to malloc.
   When the user wants memory via the traditional malloc() he uses
instead LMalloc() where LMalloc maintains a local memory management pool
and allocates memory on request. At the User Program API the User sees
LMalloc() behave EXACTLY like malloc() (WITHOUT SYSTEM CALLS).
Thomas Dineen

This isn't quite correct for any OS I've worked with - which are a lot.

malloc() is implemented by a runtime library. There are a lot of different strategies used by the RTLs, but the RTL call generally does not involve a context-switching syscall. Any reasonably modern RTL will maintain one or more free lists (typically based on common allocation sizes). Will allocate from zones that make returning (free'd) memory efficient. May allocate more memory than requested to reduce fragmentation and to avoid copies on realloc(). May request physical placement based on NUMA. Granular locking for threaded environments (with clever locking strategies) And more.

Only when the RTL's preallocated zones and free lists are unable to satisfy a request will a syscall to the OS be made to obtain more memory. Unless your application is pathological, this will not be frequent.

There was a time when the RTLs' malloc() implementations were less sophisticated, and the strategy that you suggest could have been beneficial - especially when the application's use pattern didn't match the RTL's strategies. But any RTL today already does what you suggest. Probably better. And it's debugged.

Much work - and many academic papers - have gone into the RTLs' implementations of dynamic memory. It's rarely the case that a custom implementation will do sufficiently better to justify the effort. (There are a few tools that do use custom implementations for debugging and detecting misuse.)

That said, there is a non-zero cost to allocation, deallocation, and (especially some) resizing. It can be worthwhile to avoid unnecessary use of heap memory. And if you expect megabytes, don't do an initial allocation of 1 byte. If you know a reasonable upper bound on a stream, allocate that rather than increasing your buffer a byte at a time. Making unnecessary copies can also add up, which is why many languages (and packages) will use reference counts instead. (There are more papers on garbage collection algorithms...)


This area has been studied, engineered, and improved since the 60s.

Bottom line: it's not as simple as you suggest to do better than the RTL. And heap memory is a useful tool that simplifies programming, so it should be used. But like any sharp tool, you do need to understand how to avoid being hurt by misuse.


Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.

OpenPGP_signature
Description: OpenPGP digital signature

-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: Fewer mallocs is better, episode #47

Reply via email to