+1. I think the benefits outweigh the risks.

On Wed, Jun 5, 2024 at 3:05 PM Anja <anja.kef...@gmail.com> wrote:
>
> I did want to start off by acknowledging that all of the pros you listed
> for mimalloc are accurate.
>
> I did want to contribute the times that people have been caught off-guard
> by the perceived increased memory allocation of mimalloc compared to the
> alternatives:
> E.g. https://github.com/microsoft/mimalloc/issues/393
> https://github.com/apache/arrow/issues/37361
>
> From your investigation Antoine, it does seem that the bulk of this is
> actually "pre-allocated virtual memory":
> https://github.com/apache/arrow/issues/40301 and thus people don't need to
> be generally concerned about this.
>
> I'm working on a blogpost that will hopefully decrease the number of
> "mimalloc is over-allocating memory" issues we will get. I think I should
> get this ready in anticipation of the switch. We don't want a bunch of open
> issues about memory leaks.
>
> On Wed, 5 Jun 2024 at 08:18, Antoine Pitrou <anto...@python.org> wrote:
>
> >
> > Hello,
> >
> > Arrow C++ features a MemoryPool abstraction that allows using different
> > allocators interchangeably. Several MemoryPool implementations are
> > provided with Arrow C++ (though one can also build their own):
> >
> > - a jemalloc-based implementation, currently the default on Linux
> > - a mimalloc-based implementation, currently the default on macOS and
> > Windows
> > - an implementation that defers to the system's standard allocator
> > (using the malloc() and free() calls), available as a fallback and for
> > experimentation
> >
> > While jemalloc is the current default on Linux, our continuous
> > benchmarking infrastructure actually enables mimalloc instead.
> > Therefore, I've made a draft PR that switches our benchmarking to
> > jemalloc, so as to measure any concrete differences between the two:
> > https://github.com/apache/arrow/pull/41205
> >
> > The results show that there is a large number of performance drops with
> > large effect sizes on the C++ microbenchmarks. There is also a smaller
> > number of C++ microbenchmarks with improved performance results. A
> > summary report with links to detailed results can be found here:
> > https://github.com/apache/arrow/runs/25745674261
> >
> >
> > With this in mind, I would like to propose that we switch the default to
> > mimalloc for all platforms. This would have several desirable effects:
> >
> > - less variability between platforms
> > - mimalloc generally has a nicer, more consistent API and is easier to
> > work with (in particular, jemalloc's configuration scheme is slightly
> > abtruse)
> > - potentially better performance, or at least not significantly worse,
> > than the statu quo
> >
> > We would have to keep at least one CI job with jemalloc enabled, to make
> > sure we're not regressing in that regard.
> >
> > What do you think?
> >
> > Regards
> >
> > Antoine.
> >

Reply via email to