I did want to start off by acknowledging that all of the pros you listed
for mimalloc are accurate.

I did want to contribute the times that people have been caught off-guard
by the perceived increased memory allocation of mimalloc compared to the
alternatives:
E.g. https://github.com/microsoft/mimalloc/issues/393
https://github.com/apache/arrow/issues/37361

>From your investigation Antoine, it does seem that the bulk of this is
actually "pre-allocated virtual memory":
https://github.com/apache/arrow/issues/40301 and thus people don't need to
be generally concerned about this.

I'm working on a blogpost that will hopefully decrease the number of
"mimalloc is over-allocating memory" issues we will get. I think I should
get this ready in anticipation of the switch. We don't want a bunch of open
issues about memory leaks.

On Wed, 5 Jun 2024 at 08:18, Antoine Pitrou <anto...@python.org> wrote:

>
> Hello,
>
> Arrow C++ features a MemoryPool abstraction that allows using different
> allocators interchangeably. Several MemoryPool implementations are
> provided with Arrow C++ (though one can also build their own):
>
> - a jemalloc-based implementation, currently the default on Linux
> - a mimalloc-based implementation, currently the default on macOS and
> Windows
> - an implementation that defers to the system's standard allocator
> (using the malloc() and free() calls), available as a fallback and for
> experimentation
>
> While jemalloc is the current default on Linux, our continuous
> benchmarking infrastructure actually enables mimalloc instead.
> Therefore, I've made a draft PR that switches our benchmarking to
> jemalloc, so as to measure any concrete differences between the two:
> https://github.com/apache/arrow/pull/41205
>
> The results show that there is a large number of performance drops with
> large effect sizes on the C++ microbenchmarks. There is also a smaller
> number of C++ microbenchmarks with improved performance results. A
> summary report with links to detailed results can be found here:
> https://github.com/apache/arrow/runs/25745674261
>
>
> With this in mind, I would like to propose that we switch the default to
> mimalloc for all platforms. This would have several desirable effects:
>
> - less variability between platforms
> - mimalloc generally has a nicer, more consistent API and is easier to
> work with (in particular, jemalloc's configuration scheme is slightly
> abtruse)
> - potentially better performance, or at least not significantly worse,
> than the statu quo
>
> We would have to keep at least one CI job with jemalloc enabled, to make
> sure we're not regressing in that regard.
>
> What do you think?
>
> Regards
>
> Antoine.
>

Reply via email to