Hi, Could you try https://github.com/apache/arrow/pull/13373 ? This will work with -DARROW_JEMALLOC=ON because it doesn't override posix_memalign() in the system memory pool even when -DARROW_JEMALLOC=ON is specified.
Thanks, -- kou In <[email protected]> "Re: Custom default C++ memory pool on Linux, and/or interception/auditing of system pool" on Wed, 15 Jun 2022 08:38:54 +0900 (JST), Sutou Kouhei <[email protected]> wrote: > Hi, > > I think that compiler builtins aren't related. Could you try > only with -DARROW_JEMALLOC=OFF? > > Thanks, > -- > kou > > In <cack8hr7rx2ajm79ytccrulv6zlwcygestygqhmzebmpjeuv...@mail.gmail.com> > "Re: Custom default C++ memory pool on Linux, and/or interception/auditing > of system pool" on Tue, 14 Jun 2022 18:32:00 -0500, > John Muehlhausen <[email protected]> wrote: > >> Thanks for the reply. I had disabled jemalloc >> via ARROW_DEFAULT_MEMORY_POOL so that was not the issue. >> >> The issue was (I think) that the arrow lib I was using was built with >> compiler builtins (such as __builtin_posix_memalign) so that even the >> system default allocator wasn't able to be intercepted. >> >> One way to solve this is to build Arrow with -fno-builtin, but >> unfortunately that disables a lot of builtins that a person may still >> want. Since allocation is a whole family of functions and not just a few, >> it is somewhat difficult to determine which builtins to selectively >> disallow. It would be nice if some project (arrow? mimalloc?) made such >> documentation for popular compilers that substitute builtins for allocation >> routines. >> >> I opened an issue on mimalloc for this documentation... or at least a >> warning about builtins for those using the interception techniques such as >> LD_PRELOAD. >> >> -John >> >> On Tue, Jun 14, 2022 at 3:40 PM Sutou Kouhei <[email protected]> wrote: >> >>> Hi, >>> >>> posix_memalign() in memory_pool.cc of libarrow-dev uses >>> jemalloc's posix_memalign() (je_posix_memalign()). Because >>> it's built with ARROW_JEMALLOC=ON (default) and >>> JEMALLOC_MANGLE >>> >>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.cc#L53 >>> . So we can't use mimalloc with LD_PRELOAD. >>> >>> The comment for JEMALLOC_MANGLE in >>> memory_pool.c said "Needed to support jemalloc 3 and 4" bu >>> we bundle jemalloc 5.2.1 now. So we can remove JEMALLOC_MANGLE. >>> >>> Could you open an issue on Jira >>> https://issues.apache.org/jira/browse/ARROW to add support >>> for overriding system memory pool's allocator by LD_PRELOAD? >>> (Do you want to work on this?) >>> >>> >>> Thanks, >>> -- >>> kou >>> >>> In <cack8hr5ltedfwrat3flsdp1hq5bsoj+dcilvqjdzpdome29...@mail.gmail.com> >>> "Custom default C++ memory pool on Linux, and/or interception/auditing >>> of system pool" on Tue, 14 Jun 2022 09:06:51 -0500, >>> John Muehlhausen <[email protected]> wrote: >>> >>> > Hello, >>> > >>> > This comment is regarding installation with `apt` on ubuntu 18.04 ... >>> > `libarrow-dev/bionic,now 8.0.0-1 amd64` >>> > >>> > I'm a bit confused about the memory pool situation: >>> > >>> > * I run with `ARROW_DEFAULT_MEMORY_POOL=system` and check that >>> > `arrow::default_memory_pool()->backend_name() == >>> > arrow::system_memory_pool()->backend_name()` >>> > >>> > * I then LD_PRELOAD a customized (*) mimalloc according to the directions >>> > at the mimalloc git repo and things like `strm->Reset(INT32_MAX);` seem >>> not >>> > to be hitting it... I figured that is a big enough chunk to jostle it >>> into >>> > doing something... `BufferOutputStream::Create(INT32_MAX)` is also not >>> > intercepted by mimalloc. Is the "system" pool somehow going around the >>> > typical allocation interfaces on linux? I built my own .so and linked it >>> > to the app and malloc() is getting intercepted. >>> > >>> > * `arrow::mimalloc_memory_pool(&mmmp);` does return something... but >>> > apparently not "my" mimalloc ... statically linked? >>> > >>> > * what is going on in Arrow with constructor (pre-main()) allocations? >>> > Some of this does hit my LD_PRELOADed mimalloc >>> > >>> > * any way to get symbols for the apt-installed libs or would I need to >>> > build from source to get backtrace with symbols? (for chasing down >>> sources >>> > of allocations) >>> > >>> > * what is the C++ lib equivalent of the following from the Python code? >>> I >>> > figure I could stop trying to understand the built-in/default allocators >>> if >>> > I could just replace them... but this may also intersect with my question >>> > about constructors. Maybe I'd have to make sure my constructor runs >>> first >>> > to perform the switch-a-roo before anything else tries to use the default >>> > pool? >>> > >>> > ``` >>> > namespace py { >>> > >>> > static std::mutex memory_pool_mutex; >>> > static MemoryPool* default_python_pool = nullptr; >>> > >>> > void set_default_memory_pool(MemoryPool* pool) { >>> > std::lock_guard<std::mutex> guard(memory_pool_mutex); >>> > default_python_pool = pool; >>> > } >>> > ``` >>> > >>> > >>> > (*) the mimalloc customization: the main app has a weak reference that >>> ends >>> > up defined by the LD_PRELOAD mimalloc, where the function so-supplied >>> > allows the app to install a function pointer (back to the main app) that >>> > gets called (if defined) at various interesting points in mimalloc >>> > >>> > >>> > Thanks, >>> > John >>>
