+1. I think the benefits outweigh the risks.
On Wed, Jun 5, 2024 at 3:05 PM Anja <anja.kef...@gmail.com> wrote: > > I did want to start off by acknowledging that all of the pros you listed > for mimalloc are accurate. > > I did want to contribute the times that people have been caught off-guard > by the perceived increased memory allocation of mimalloc compared to the > alternatives: > E.g. https://github.com/microsoft/mimalloc/issues/393 > https://github.com/apache/arrow/issues/37361 > > From your investigation Antoine, it does seem that the bulk of this is > actually "pre-allocated virtual memory": > https://github.com/apache/arrow/issues/40301 and thus people don't need to > be generally concerned about this. > > I'm working on a blogpost that will hopefully decrease the number of > "mimalloc is over-allocating memory" issues we will get. I think I should > get this ready in anticipation of the switch. We don't want a bunch of open > issues about memory leaks. > > On Wed, 5 Jun 2024 at 08:18, Antoine Pitrou <anto...@python.org> wrote: > > > > > Hello, > > > > Arrow C++ features a MemoryPool abstraction that allows using different > > allocators interchangeably. Several MemoryPool implementations are > > provided with Arrow C++ (though one can also build their own): > > > > - a jemalloc-based implementation, currently the default on Linux > > - a mimalloc-based implementation, currently the default on macOS and > > Windows > > - an implementation that defers to the system's standard allocator > > (using the malloc() and free() calls), available as a fallback and for > > experimentation > > > > While jemalloc is the current default on Linux, our continuous > > benchmarking infrastructure actually enables mimalloc instead. > > Therefore, I've made a draft PR that switches our benchmarking to > > jemalloc, so as to measure any concrete differences between the two: > > https://github.com/apache/arrow/pull/41205 > > > > The results show that there is a large number of performance drops with > > large effect sizes on the C++ microbenchmarks. There is also a smaller > > number of C++ microbenchmarks with improved performance results. A > > summary report with links to detailed results can be found here: > > https://github.com/apache/arrow/runs/25745674261 > > > > > > With this in mind, I would like to propose that we switch the default to > > mimalloc for all platforms. This would have several desirable effects: > > > > - less variability between platforms > > - mimalloc generally has a nicer, more consistent API and is easier to > > work with (in particular, jemalloc's configuration scheme is slightly > > abtruse) > > - potentially better performance, or at least not significantly worse, > > than the statu quo > > > > We would have to keep at least one CI job with jemalloc enabled, to make > > sure we're not regressing in that regard. > > > > What do you think? > > > > Regards > > > > Antoine. > >