Hi, posix_memalign() in memory_pool.cc of libarrow-dev uses jemalloc's posix_memalign() (je_posix_memalign()). Because it's built with ARROW_JEMALLOC=ON (default) and JEMALLOC_MANGLE https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.cc#L53 . So we can't use mimalloc with LD_PRELOAD.
The comment for JEMALLOC_MANGLE in memory_pool.c said "Needed to support jemalloc 3 and 4" bu we bundle jemalloc 5.2.1 now. So we can remove JEMALLOC_MANGLE. Could you open an issue on Jira https://issues.apache.org/jira/browse/ARROW to add support for overriding system memory pool's allocator by LD_PRELOAD? (Do you want to work on this?) Thanks, -- kou In <cack8hr5ltedfwrat3flsdp1hq5bsoj+dcilvqjdzpdome29...@mail.gmail.com> "Custom default C++ memory pool on Linux, and/or interception/auditing of system pool" on Tue, 14 Jun 2022 09:06:51 -0500, John Muehlhausen <j...@jgm.org> wrote: > Hello, > > This comment is regarding installation with `apt` on ubuntu 18.04 ... > `libarrow-dev/bionic,now 8.0.0-1 amd64` > > I'm a bit confused about the memory pool situation: > > * I run with `ARROW_DEFAULT_MEMORY_POOL=system` and check that > `arrow::default_memory_pool()->backend_name() == > arrow::system_memory_pool()->backend_name()` > > * I then LD_PRELOAD a customized (*) mimalloc according to the directions > at the mimalloc git repo and things like `strm->Reset(INT32_MAX);` seem not > to be hitting it... I figured that is a big enough chunk to jostle it into > doing something... `BufferOutputStream::Create(INT32_MAX)` is also not > intercepted by mimalloc. Is the "system" pool somehow going around the > typical allocation interfaces on linux? I built my own .so and linked it > to the app and malloc() is getting intercepted. > > * `arrow::mimalloc_memory_pool(&mmmp);` does return something... but > apparently not "my" mimalloc ... statically linked? > > * what is going on in Arrow with constructor (pre-main()) allocations? > Some of this does hit my LD_PRELOADed mimalloc > > * any way to get symbols for the apt-installed libs or would I need to > build from source to get backtrace with symbols? (for chasing down sources > of allocations) > > * what is the C++ lib equivalent of the following from the Python code? I > figure I could stop trying to understand the built-in/default allocators if > I could just replace them... but this may also intersect with my question > about constructors. Maybe I'd have to make sure my constructor runs first > to perform the switch-a-roo before anything else tries to use the default > pool? > > ``` > namespace py { > > static std::mutex memory_pool_mutex; > static MemoryPool* default_python_pool = nullptr; > > void set_default_memory_pool(MemoryPool* pool) { > std::lock_guard<std::mutex> guard(memory_pool_mutex); > default_python_pool = pool; > } > ``` > > > (*) the mimalloc customization: the main app has a weak reference that ends > up defined by the LD_PRELOAD mimalloc, where the function so-supplied > allows the app to install a function pointer (back to the main app) that > gets called (if defined) at various interesting points in mimalloc > > > Thanks, > John