Hello,

Arrow C++ features a MemoryPool abstraction that allows using different allocators interchangeably. Several MemoryPool implementations are provided with Arrow C++ (though one can also build their own):

- a jemalloc-based implementation, currently the default on Linux
- a mimalloc-based implementation, currently the default on macOS and Windows - an implementation that defers to the system's standard allocator (using the malloc() and free() calls), available as a fallback and for experimentation

While jemalloc is the current default on Linux, our continuous benchmarking infrastructure actually enables mimalloc instead. Therefore, I've made a draft PR that switches our benchmarking to jemalloc, so as to measure any concrete differences between the two:
https://github.com/apache/arrow/pull/41205

The results show that there is a large number of performance drops with large effect sizes on the C++ microbenchmarks. There is also a smaller number of C++ microbenchmarks with improved performance results. A summary report with links to detailed results can be found here:
https://github.com/apache/arrow/runs/25745674261


With this in mind, I would like to propose that we switch the default to mimalloc for all platforms. This would have several desirable effects:

- less variability between platforms
- mimalloc generally has a nicer, more consistent API and is easier to work with (in particular, jemalloc's configuration scheme is slightly abtruse) - potentially better performance, or at least not significantly worse, than the statu quo

We would have to keep at least one CI job with jemalloc enabled, to make sure we're not regressing in that regard.

What do you think?

Regards

Antoine.

Reply via email to