I can try and give a more detailed answer later in the week but the gist of it is that Arrow manages all "buffer allocations" with a memory pool. These are the allocations for the actual data in the arrays. These are the allocations that use the memory pool configured by ARROW_DEFAULT_MEMORY_POOL.
To avoid interfering with the user's allocations Arrow does not configure the system allocator at all. So when Arrow builds it alters it slightly (using cmake variables I think) to be specific to Arrow. This might make it a bit tricky to get debug symbols for jemalloc but you could always build Arrow in debug mode and intercept the methods in memory_pool.cc if your focus is tracking allocations. Arrow still uses the system allocator for all non-buffer allocations. So, for example, when reading in a large IPC file, the majority of the data will be allocated by Arrow's memory pool. However, the schema, and the wrapper array object itself will be allocated by the system allocator. This is probably why switching the system allocator to jemalloc shows some, but not all, Arrow allocations happening there. On Tue, Jun 14, 2022 at 5:28 AM John Muehlhausen <j...@jgm.org> wrote: > > A code review has demonstrated that Arrow uses posix_memalign ... I do > believe mimalloc preload is "catching" this but I didn't tool it with my > customization. Still interested in any guidance on the other points > raised, and sorry for some of this being noise. > > -John > > On Tue, Jun 14, 2022 at 9:06 AM John Muehlhausen <j...@jgm.org> wrote: > > > Hello, > > > > This comment is regarding installation with `apt` on ubuntu 18.04 ... > > `libarrow-dev/bionic,now 8.0.0-1 amd64` > > > > I'm a bit confused about the memory pool situation: > > > > * I run with `ARROW_DEFAULT_MEMORY_POOL=system` and check that > > `arrow::default_memory_pool()->backend_name() == > > arrow::system_memory_pool()->backend_name()` > > > > * I then LD_PRELOAD a customized (*) mimalloc according to the directions > > at the mimalloc git repo and things like `strm->Reset(INT32_MAX);` seem not > > to be hitting it... I figured that is a big enough chunk to jostle it into > > doing something... `BufferOutputStream::Create(INT32_MAX)` is also not > > intercepted by mimalloc. Is the "system" pool somehow going around the > > typical allocation interfaces on linux? I built my own .so and linked it > > to the app and malloc() is getting intercepted. > > > > * `arrow::mimalloc_memory_pool(&mmmp);` does return something... but > > apparently not "my" mimalloc ... statically linked? > > > > * what is going on in Arrow with constructor (pre-main()) allocations? > > Some of this does hit my LD_PRELOADed mimalloc > > > > * any way to get symbols for the apt-installed libs or would I need to > > build from source to get backtrace with symbols? (for chasing down sources > > of allocations) > > > > * what is the C++ lib equivalent of the following from the Python code? I > > figure I could stop trying to understand the built-in/default allocators if > > I could just replace them... but this may also intersect with my question > > about constructors. Maybe I'd have to make sure my constructor runs first > > to perform the switch-a-roo before anything else tries to use the default > > pool? > > > > ``` > > namespace py { > > > > static std::mutex memory_pool_mutex; > > static MemoryPool* default_python_pool = nullptr; > > > > void set_default_memory_pool(MemoryPool* pool) { > > std::lock_guard<std::mutex> guard(memory_pool_mutex); > > default_python_pool = pool; > > } > > ``` > > > > > > (*) the mimalloc customization: the main app has a weak reference that > > ends up defined by the LD_PRELOAD mimalloc, where the function so-supplied > > allows the app to install a function pointer (back to the main app) that > > gets called (if defined) at various interesting points in mimalloc > > > > > > Thanks, > > John > >