A minimal build using the following seems to have solved my problem. The various no-builtin params are guesswork based largely on alloc-override.c from mimalloc. It would be nice if someone documented somewhere how to turn off classes of builtins for each popular compiler or if this received compiler support (e.g. -fno-builtingroup-allocation)... turning off ALL builtins seems too heavy-handed.
cmake -E env CFLAGS="-fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fno-builtin-reallocf -fno-builtin-malloc_size -fno-builtin-malloc_usable_size -fno-builtin-valloc -fno-builtin-vfree -fno-builtin-malloc_good_size -fno-builtin-posix_memalign -fno-builtin-alligned_alloc -fno-builtin-cfree -fno-builtin-pvalloc -fno-builtin-reallocarray -fno-builtin-reallocarr -fno-builtin-memalign -fno-builtin-_aligned_malloc -fno-builtin-__libc_malloc -fno-builtin-__libc_calloc -fno-builtin-__libc_realloc -fno-builtin-__libc_free -fno-builtin-__libc_cfree -fno-builtin-__libc_valloc -fno-builtin-__libc_pvalloc -fno-builtin-__libc_memalign -fno-builtin-__posix_memalign -fno-builtin-operator_new -fno-builtin-operator_delete" CXXFLAGS="-fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -fno-builtin-reallocf -fno-builtin-malloc_size -fno-builtin-malloc_usable_size -fno-builtin-valloc -fno-builtin-vfree -fno-builtin-malloc_good_size -fno-builtin-posix_memalign -fno-builtin-alligned_alloc -fno-builtin-cfree -fno-builtin-pvalloc -fno-builtin-reallocarray -fno-builtin-reallocarr -fno-builtin-memalign -fno-builtin-_aligned_malloc -fno-builtin-__libc_malloc -fno-builtin-__libc_calloc -fno-builtin-__libc_realloc -fno-builtin-__libc_free -fno-builtin-__libc_cfree -fno-builtin-__libc_valloc -fno-builtin-__libc_pvalloc -fno-builtin-__libc_memalign -fno-builtin-__posix_memalign -fno-builtin-operator_new -fno-builtin-operator_delete" cmake --preset ninja-debug-minimal -DARROW_JEMALLOC=OFF -DARROW_MIMALLOC=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=/usr/local .. On Tue, Jun 14, 2022 at 12:36 PM John Muehlhausen <j...@jgm.org> wrote: > My best guess at this moment is that the Arrow lib I'm using was built > with a compiler that had something like __builtin_posix_memalign in effect > ?? > > I say this because deploying __builtin_malloc has the same deleterious > effect on my own .so > > On Tue, Jun 14, 2022 at 10:53 AM John Muehlhausen <j...@jgm.org> wrote: > >> I'm using ARROW_DEFAULT_MEMORY_POOL=system >> >> Based on a review of memory_pool.cc I expect this to become >> posix_memalign calls on Linux >> >> When I call posiix_memalign in a .so that I created and linked with my >> app, using LD_PRELOAD=/usr/local/lib/libmimalloc.so to run the app, these >> calls get forwarded to mi_posix_memalign (because I threw a prinf in there >> and re-built mimalloc)... note, I'm not talking about Arrow's built-in >> mimalloc. >> >> Maybe Arrow's mimalloc is keeping the LD_PRELOAD of my custom mimalloc >> from taking effect? How is mimalloc included in Arrow? When I >> call arrow::mimalloc_memory_pool() I do get an Ok status, so it is in the >> build I'm using from `apt` >> >> -John >> >> On Tue, Jun 14, 2022 at 10:37 AM Weston Pace <weston.p...@gmail.com> >> wrote: >> >>> Sorry, that should have said "when Arrow builds jemalloc". Here is >>> the command we send down (from ThirdPartyToolchain.cmake): >>> >>> ``` >>> JEMALLOC_CONFIGURE_COMMAND >>> "--prefix=${JEMALLOC_PREFIX}" >>> "--libdir=${JEMALLOC_LIB_DIR}" >>> "--with-jemalloc-prefix=je_arrow_" >>> "--with-private-namespace=je_arrow_private_" >>> "--without-export" >>> "--disable-shared" >>> # Don't override operator new() >>> "--disable-cxx" >>> "--disable-libdl" >>> # See https://github.com/jemalloc/jemalloc/issues/1237 >>> "--disable-initial-exec-tls" >>> ${EP_LOG_OPTIONS}) >>> list(APPEND >>> ``` >>> >>> On Tue, Jun 14, 2022 at 5:35 AM Weston Pace <weston.p...@gmail.com> >>> wrote: >>> > >>> > I can try and give a more detailed answer later in the week but the >>> > gist of it is that Arrow manages all "buffer allocations" with a >>> > memory pool. These are the allocations for the actual data in the >>> > arrays. These are the allocations that use the memory pool configured >>> > by ARROW_DEFAULT_MEMORY_POOL. >>> > >>> > To avoid interfering with the user's allocations Arrow does not >>> > configure the system allocator at all. So when Arrow builds it alters >>> > it slightly (using cmake variables I think) to be specific to Arrow. >>> > This might make it a bit tricky to get debug symbols for jemalloc but >>> > you could always build Arrow in debug mode and intercept the methods >>> > in memory_pool.cc if your focus is tracking allocations. >>> > >>> > Arrow still uses the system allocator for all non-buffer allocations. >>> > So, for example, when reading in a large IPC file, the majority of the >>> > data will be allocated by Arrow's memory pool. However, the schema, >>> > and the wrapper array object itself will be allocated by the system >>> > allocator. This is probably why switching the system allocator to >>> > jemalloc shows some, but not all, Arrow allocations happening there. >>> > >>> > On Tue, Jun 14, 2022 at 5:28 AM John Muehlhausen <j...@jgm.org> wrote: >>> > > >>> > > A code review has demonstrated that Arrow uses posix_memalign ... I >>> do >>> > > believe mimalloc preload is "catching" this but I didn't tool it >>> with my >>> > > customization. Still interested in any guidance on the other points >>> > > raised, and sorry for some of this being noise. >>> > > >>> > > -John >>> > > >>> > > On Tue, Jun 14, 2022 at 9:06 AM John Muehlhausen <j...@jgm.org> >>> wrote: >>> > > >>> > > > Hello, >>> > > > >>> > > > This comment is regarding installation with `apt` on ubuntu 18.04 >>> ... >>> > > > `libarrow-dev/bionic,now 8.0.0-1 amd64` >>> > > > >>> > > > I'm a bit confused about the memory pool situation: >>> > > > >>> > > > * I run with `ARROW_DEFAULT_MEMORY_POOL=system` and check that >>> > > > `arrow::default_memory_pool()->backend_name() == >>> > > > arrow::system_memory_pool()->backend_name()` >>> > > > >>> > > > * I then LD_PRELOAD a customized (*) mimalloc according to the >>> directions >>> > > > at the mimalloc git repo and things like `strm->Reset(INT32_MAX);` >>> seem not >>> > > > to be hitting it... I figured that is a big enough chunk to jostle >>> it into >>> > > > doing something... `BufferOutputStream::Create(INT32_MAX)` is also >>> not >>> > > > intercepted by mimalloc. Is the "system" pool somehow going >>> around the >>> > > > typical allocation interfaces on linux? I built my own .so and >>> linked it >>> > > > to the app and malloc() is getting intercepted. >>> > > > >>> > > > * `arrow::mimalloc_memory_pool(&mmmp);` does return something... >>> but >>> > > > apparently not "my" mimalloc ... statically linked? >>> > > > >>> > > > * what is going on in Arrow with constructor (pre-main()) >>> allocations? >>> > > > Some of this does hit my LD_PRELOADed mimalloc >>> > > > >>> > > > * any way to get symbols for the apt-installed libs or would I >>> need to >>> > > > build from source to get backtrace with symbols? (for chasing down >>> sources >>> > > > of allocations) >>> > > > >>> > > > * what is the C++ lib equivalent of the following from the Python >>> code? I >>> > > > figure I could stop trying to understand the built-in/default >>> allocators if >>> > > > I could just replace them... but this may also intersect with my >>> question >>> > > > about constructors. Maybe I'd have to make sure my constructor >>> runs first >>> > > > to perform the switch-a-roo before anything else tries to use the >>> default >>> > > > pool? >>> > > > >>> > > > ``` >>> > > > namespace py { >>> > > > >>> > > > static std::mutex memory_pool_mutex; >>> > > > static MemoryPool* default_python_pool = nullptr; >>> > > > >>> > > > void set_default_memory_pool(MemoryPool* pool) { >>> > > > std::lock_guard<std::mutex> guard(memory_pool_mutex); >>> > > > default_python_pool = pool; >>> > > > } >>> > > > ``` >>> > > > >>> > > > >>> > > > (*) the mimalloc customization: the main app has a weak reference >>> that >>> > > > ends up defined by the LD_PRELOAD mimalloc, where the function >>> so-supplied >>> > > > allows the app to install a function pointer (back to the main >>> app) that >>> > > > gets called (if defined) at various interesting points in mimalloc >>> > > > >>> > > > >>> > > > Thanks, >>> > > > John >>> > > > >>> >>