I did want to start off by acknowledging that all of the pros you listed for mimalloc are accurate.
I did want to contribute the times that people have been caught off-guard by the perceived increased memory allocation of mimalloc compared to the alternatives: E.g. https://github.com/microsoft/mimalloc/issues/393 https://github.com/apache/arrow/issues/37361 >From your investigation Antoine, it does seem that the bulk of this is actually "pre-allocated virtual memory": https://github.com/apache/arrow/issues/40301 and thus people don't need to be generally concerned about this. I'm working on a blogpost that will hopefully decrease the number of "mimalloc is over-allocating memory" issues we will get. I think I should get this ready in anticipation of the switch. We don't want a bunch of open issues about memory leaks. On Wed, 5 Jun 2024 at 08:18, Antoine Pitrou <anto...@python.org> wrote: > > Hello, > > Arrow C++ features a MemoryPool abstraction that allows using different > allocators interchangeably. Several MemoryPool implementations are > provided with Arrow C++ (though one can also build their own): > > - a jemalloc-based implementation, currently the default on Linux > - a mimalloc-based implementation, currently the default on macOS and > Windows > - an implementation that defers to the system's standard allocator > (using the malloc() and free() calls), available as a fallback and for > experimentation > > While jemalloc is the current default on Linux, our continuous > benchmarking infrastructure actually enables mimalloc instead. > Therefore, I've made a draft PR that switches our benchmarking to > jemalloc, so as to measure any concrete differences between the two: > https://github.com/apache/arrow/pull/41205 > > The results show that there is a large number of performance drops with > large effect sizes on the C++ microbenchmarks. There is also a smaller > number of C++ microbenchmarks with improved performance results. A > summary report with links to detailed results can be found here: > https://github.com/apache/arrow/runs/25745674261 > > > With this in mind, I would like to propose that we switch the default to > mimalloc for all platforms. This would have several desirable effects: > > - less variability between platforms > - mimalloc generally has a nicer, more consistent API and is easier to > work with (in particular, jemalloc's configuration scheme is slightly > abtruse) > - potentially better performance, or at least not significantly worse, > than the statu quo > > We would have to keep at least one CI job with jemalloc enabled, to make > sure we're not regressing in that regard. > > What do you think? > > Regards > > Antoine. >