My best guess at this moment is that the Arrow lib I'm using was built with a compiler that had something like __builtin_posix_memalign in effect ??
I say this because deploying __builtin_malloc has the same deleterious effect on my own .so On Tue, Jun 14, 2022 at 10:53 AM John Muehlhausen <j...@jgm.org> wrote: > I'm using ARROW_DEFAULT_MEMORY_POOL=system > > Based on a review of memory_pool.cc I expect this to become posix_memalign > calls on Linux > > When I call posiix_memalign in a .so that I created and linked with my > app, using LD_PRELOAD=/usr/local/lib/libmimalloc.so to run the app, these > calls get forwarded to mi_posix_memalign (because I threw a prinf in there > and re-built mimalloc)... note, I'm not talking about Arrow's built-in > mimalloc. > > Maybe Arrow's mimalloc is keeping the LD_PRELOAD of my custom mimalloc > from taking effect? How is mimalloc included in Arrow? When I > call arrow::mimalloc_memory_pool() I do get an Ok status, so it is in the > build I'm using from `apt` > > -John > > On Tue, Jun 14, 2022 at 10:37 AM Weston Pace <weston.p...@gmail.com> > wrote: > >> Sorry, that should have said "when Arrow builds jemalloc". Here is >> the command we send down (from ThirdPartyToolchain.cmake): >> >> ``` >> JEMALLOC_CONFIGURE_COMMAND >> "--prefix=${JEMALLOC_PREFIX}" >> "--libdir=${JEMALLOC_LIB_DIR}" >> "--with-jemalloc-prefix=je_arrow_" >> "--with-private-namespace=je_arrow_private_" >> "--without-export" >> "--disable-shared" >> # Don't override operator new() >> "--disable-cxx" >> "--disable-libdl" >> # See https://github.com/jemalloc/jemalloc/issues/1237 >> "--disable-initial-exec-tls" >> ${EP_LOG_OPTIONS}) >> list(APPEND >> ``` >> >> On Tue, Jun 14, 2022 at 5:35 AM Weston Pace <weston.p...@gmail.com> >> wrote: >> > >> > I can try and give a more detailed answer later in the week but the >> > gist of it is that Arrow manages all "buffer allocations" with a >> > memory pool. These are the allocations for the actual data in the >> > arrays. These are the allocations that use the memory pool configured >> > by ARROW_DEFAULT_MEMORY_POOL. >> > >> > To avoid interfering with the user's allocations Arrow does not >> > configure the system allocator at all. So when Arrow builds it alters >> > it slightly (using cmake variables I think) to be specific to Arrow. >> > This might make it a bit tricky to get debug symbols for jemalloc but >> > you could always build Arrow in debug mode and intercept the methods >> > in memory_pool.cc if your focus is tracking allocations. >> > >> > Arrow still uses the system allocator for all non-buffer allocations. >> > So, for example, when reading in a large IPC file, the majority of the >> > data will be allocated by Arrow's memory pool. However, the schema, >> > and the wrapper array object itself will be allocated by the system >> > allocator. This is probably why switching the system allocator to >> > jemalloc shows some, but not all, Arrow allocations happening there. >> > >> > On Tue, Jun 14, 2022 at 5:28 AM John Muehlhausen <j...@jgm.org> wrote: >> > > >> > > A code review has demonstrated that Arrow uses posix_memalign ... I do >> > > believe mimalloc preload is "catching" this but I didn't tool it with >> my >> > > customization. Still interested in any guidance on the other points >> > > raised, and sorry for some of this being noise. >> > > >> > > -John >> > > >> > > On Tue, Jun 14, 2022 at 9:06 AM John Muehlhausen <j...@jgm.org> wrote: >> > > >> > > > Hello, >> > > > >> > > > This comment is regarding installation with `apt` on ubuntu 18.04 >> ... >> > > > `libarrow-dev/bionic,now 8.0.0-1 amd64` >> > > > >> > > > I'm a bit confused about the memory pool situation: >> > > > >> > > > * I run with `ARROW_DEFAULT_MEMORY_POOL=system` and check that >> > > > `arrow::default_memory_pool()->backend_name() == >> > > > arrow::system_memory_pool()->backend_name()` >> > > > >> > > > * I then LD_PRELOAD a customized (*) mimalloc according to the >> directions >> > > > at the mimalloc git repo and things like `strm->Reset(INT32_MAX);` >> seem not >> > > > to be hitting it... I figured that is a big enough chunk to jostle >> it into >> > > > doing something... `BufferOutputStream::Create(INT32_MAX)` is also >> not >> > > > intercepted by mimalloc. Is the "system" pool somehow going around >> the >> > > > typical allocation interfaces on linux? I built my own .so and >> linked it >> > > > to the app and malloc() is getting intercepted. >> > > > >> > > > * `arrow::mimalloc_memory_pool(&mmmp);` does return something... but >> > > > apparently not "my" mimalloc ... statically linked? >> > > > >> > > > * what is going on in Arrow with constructor (pre-main()) >> allocations? >> > > > Some of this does hit my LD_PRELOADed mimalloc >> > > > >> > > > * any way to get symbols for the apt-installed libs or would I need >> to >> > > > build from source to get backtrace with symbols? (for chasing down >> sources >> > > > of allocations) >> > > > >> > > > * what is the C++ lib equivalent of the following from the Python >> code? I >> > > > figure I could stop trying to understand the built-in/default >> allocators if >> > > > I could just replace them... but this may also intersect with my >> question >> > > > about constructors. Maybe I'd have to make sure my constructor >> runs first >> > > > to perform the switch-a-roo before anything else tries to use the >> default >> > > > pool? >> > > > >> > > > ``` >> > > > namespace py { >> > > > >> > > > static std::mutex memory_pool_mutex; >> > > > static MemoryPool* default_python_pool = nullptr; >> > > > >> > > > void set_default_memory_pool(MemoryPool* pool) { >> > > > std::lock_guard<std::mutex> guard(memory_pool_mutex); >> > > > default_python_pool = pool; >> > > > } >> > > > ``` >> > > > >> > > > >> > > > (*) the mimalloc customization: the main app has a weak reference >> that >> > > > ends up defined by the LD_PRELOAD mimalloc, where the function >> so-supplied >> > > > allows the app to install a function pointer (back to the main app) >> that >> > > > gets called (if defined) at various interesting points in mimalloc >> > > > >> > > > >> > > > Thanks, >> > > > John >> > > > >> >