Hi,

Could you try https://github.com/apache/arrow/pull/13373 ?
This will work with -DARROW_JEMALLOC=ON because it doesn't
override posix_memalign() in the system memory pool even
when -DARROW_JEMALLOC=ON is specified.

Thanks,
-- 
kou

In <[email protected]>
  "Re: Custom default C++ memory pool on Linux, and/or interception/auditing of 
system pool" on Wed, 15 Jun 2022 08:38:54 +0900 (JST),
  Sutou Kouhei <[email protected]> wrote:

> Hi,
> 
> I think that compiler builtins aren't related. Could you try
> only with -DARROW_JEMALLOC=OFF?
> 
> Thanks,
> --
> kou
> 
> In <cack8hr7rx2ajm79ytccrulv6zlwcygestygqhmzebmpjeuv...@mail.gmail.com>
>   "Re: Custom default C++ memory pool on Linux, and/or interception/auditing 
> of system pool" on Tue, 14 Jun 2022 18:32:00 -0500,
>   John Muehlhausen <[email protected]> wrote:
> 
>> Thanks for the reply.  I had disabled jemalloc
>> via ARROW_DEFAULT_MEMORY_POOL so that was not the issue.
>> 
>> The issue was (I think) that the arrow lib I was using was built with
>> compiler builtins (such as __builtin_posix_memalign) so that even the
>> system default allocator wasn't able to be intercepted.
>> 
>> One way to solve this is to build Arrow with -fno-builtin, but
>> unfortunately that disables a lot of builtins that a person may still
>> want.  Since allocation is a whole family of functions and not just a few,
>> it is somewhat difficult to determine which builtins to selectively
>> disallow.  It would be nice if some project (arrow? mimalloc?) made such
>> documentation for popular compilers that substitute builtins for allocation
>> routines.
>> 
>> I opened an issue on mimalloc for this documentation... or at least a
>> warning about builtins for those using the interception techniques such as
>> LD_PRELOAD.
>> 
>> -John
>> 
>> On Tue, Jun 14, 2022 at 3:40 PM Sutou Kouhei <[email protected]> wrote:
>> 
>>> Hi,
>>>
>>> posix_memalign() in memory_pool.cc of libarrow-dev uses
>>> jemalloc's posix_memalign() (je_posix_memalign()). Because
>>> it's built with ARROW_JEMALLOC=ON (default) and
>>> JEMALLOC_MANGLE
>>>
>>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/memory_pool.cc#L53
>>> . So we can't use mimalloc with LD_PRELOAD.
>>>
>>> The comment for JEMALLOC_MANGLE in
>>> memory_pool.c said "Needed to support jemalloc 3 and 4" bu
>>> we bundle jemalloc 5.2.1 now. So we can remove JEMALLOC_MANGLE.
>>>
>>> Could you open an issue on Jira
>>> https://issues.apache.org/jira/browse/ARROW to add support
>>> for overriding system memory pool's allocator by LD_PRELOAD?
>>> (Do you want to work on this?)
>>>
>>>
>>> Thanks,
>>> --
>>> kou
>>>
>>> In <cack8hr5ltedfwrat3flsdp1hq5bsoj+dcilvqjdzpdome29...@mail.gmail.com>
>>>   "Custom default C++ memory pool on Linux, and/or interception/auditing
>>> of system pool" on Tue, 14 Jun 2022 09:06:51 -0500,
>>>   John Muehlhausen <[email protected]> wrote:
>>>
>>> > Hello,
>>> >
>>> > This comment is regarding installation with `apt` on ubuntu 18.04 ...
>>> > `libarrow-dev/bionic,now 8.0.0-1 amd64`
>>> >
>>> > I'm a bit confused about the memory pool situation:
>>> >
>>> > * I run with `ARROW_DEFAULT_MEMORY_POOL=system` and check that
>>> > `arrow::default_memory_pool()->backend_name() ==
>>> > arrow::system_memory_pool()->backend_name()`
>>> >
>>> > * I then LD_PRELOAD a customized (*) mimalloc according to the directions
>>> > at the mimalloc git repo and things like `strm->Reset(INT32_MAX);` seem
>>> not
>>> > to be hitting it... I figured that is a big enough chunk to jostle it
>>> into
>>> > doing something... `BufferOutputStream::Create(INT32_MAX)` is also not
>>> > intercepted by mimalloc.  Is the "system" pool somehow going around the
>>> > typical allocation interfaces on linux?  I built my own .so and linked it
>>> > to the app and malloc() is getting intercepted.
>>> >
>>> > * `arrow::mimalloc_memory_pool(&mmmp);` does return something... but
>>> > apparently not "my" mimalloc ... statically linked?
>>> >
>>> > * what is going on in Arrow with constructor (pre-main()) allocations?
>>> > Some of this does hit my LD_PRELOADed mimalloc
>>> >
>>> > * any way to get symbols for the apt-installed libs or would I need to
>>> > build from source to get backtrace with symbols? (for chasing down
>>> sources
>>> > of allocations)
>>> >
>>> > * what is the C++ lib equivalent of the following from the Python code?
>>> I
>>> > figure I could stop trying to understand the built-in/default allocators
>>> if
>>> > I could just replace them... but this may also intersect with my question
>>> > about constructors.  Maybe I'd have to make sure my constructor runs
>>> first
>>> > to perform the switch-a-roo before anything else tries to use the default
>>> > pool?
>>> >
>>> > ```
>>> > namespace py {
>>> >
>>> > static std::mutex memory_pool_mutex;
>>> > static MemoryPool* default_python_pool = nullptr;
>>> >
>>> > void set_default_memory_pool(MemoryPool* pool) {
>>> >   std::lock_guard<std::mutex> guard(memory_pool_mutex);
>>> >   default_python_pool = pool;
>>> > }
>>> > ```
>>> >
>>> >
>>> > (*) the mimalloc customization: the main app has a weak reference that
>>> ends
>>> > up defined by the LD_PRELOAD mimalloc, where the function so-supplied
>>> > allows the app to install a function pointer (back to the main app) that
>>> > gets called (if defined) at various interesting points in mimalloc
>>> >
>>> >
>>> > Thanks,
>>> > John
>>>

Reply via email to