Hello,
Arrow C++ features a MemoryPool abstraction that allows using different
allocators interchangeably. Several MemoryPool implementations are
provided with Arrow C++ (though one can also build their own):
- a jemalloc-based implementation, currently the default on Linux
- a mimalloc-based implementation, currently the default on macOS and
Windows
- an implementation that defers to the system's standard allocator
(using the malloc() and free() calls), available as a fallback and for
experimentation
While jemalloc is the current default on Linux, our continuous
benchmarking infrastructure actually enables mimalloc instead.
Therefore, I've made a draft PR that switches our benchmarking to
jemalloc, so as to measure any concrete differences between the two:
https://github.com/apache/arrow/pull/41205
The results show that there is a large number of performance drops with
large effect sizes on the C++ microbenchmarks. There is also a smaller
number of C++ microbenchmarks with improved performance results. A
summary report with links to detailed results can be found here:
https://github.com/apache/arrow/runs/25745674261
With this in mind, I would like to propose that we switch the default to
mimalloc for all platforms. This would have several desirable effects:
- less variability between platforms
- mimalloc generally has a nicer, more consistent API and is easier to
work with (in particular, jemalloc's configuration scheme is slightly
abtruse)
- potentially better performance, or at least not significantly worse,
than the statu quo
We would have to keep at least one CI job with jemalloc enabled, to make
sure we're not regressing in that regard.
What do you think?
Regards
Antoine.