My best guess at this moment is that the Arrow lib I'm using was built with
a compiler that had something like __builtin_posix_memalign in effect ??

I say this because deploying __builtin_malloc has the same deleterious
effect on my own .so

On Tue, Jun 14, 2022 at 10:53 AM John Muehlhausen <j...@jgm.org> wrote:

> I'm using ARROW_DEFAULT_MEMORY_POOL=system
>
> Based on a review of memory_pool.cc I expect this to become posix_memalign
> calls on Linux
>
> When I call posiix_memalign in a .so that I created and linked with my
> app, using LD_PRELOAD=/usr/local/lib/libmimalloc.so to run the app, these
> calls get forwarded to mi_posix_memalign (because I threw a prinf in there
> and re-built mimalloc)... note, I'm not talking about Arrow's built-in
> mimalloc.
>
> Maybe Arrow's mimalloc is keeping the LD_PRELOAD of my custom mimalloc
> from taking effect?  How is mimalloc included in Arrow?  When I
> call arrow::mimalloc_memory_pool() I do get an Ok status, so it is in the
> build I'm using from `apt`
>
> -John
>
> On Tue, Jun 14, 2022 at 10:37 AM Weston Pace <weston.p...@gmail.com>
> wrote:
>
>> Sorry, that should have said "when Arrow builds jemalloc".  Here is
>> the command we send down (from ThirdPartyToolchain.cmake):
>>
>> ```
>> JEMALLOC_CONFIGURE_COMMAND
>> "--prefix=${JEMALLOC_PREFIX}"
>> "--libdir=${JEMALLOC_LIB_DIR}"
>> "--with-jemalloc-prefix=je_arrow_"
>> "--with-private-namespace=je_arrow_private_"
>> "--without-export"
>> "--disable-shared"
>> # Don't override operator new()
>> "--disable-cxx"
>> "--disable-libdl"
>> # See https://github.com/jemalloc/jemalloc/issues/1237
>> "--disable-initial-exec-tls"
>> ${EP_LOG_OPTIONS})
>> list(APPEND
>> ```
>>
>> On Tue, Jun 14, 2022 at 5:35 AM Weston Pace <weston.p...@gmail.com>
>> wrote:
>> >
>> > I can try and give a more detailed answer later in the week but the
>> > gist of it is that Arrow manages all "buffer allocations" with a
>> > memory pool.  These are the allocations for the actual data in the
>> > arrays.  These are the allocations that use the memory pool configured
>> > by ARROW_DEFAULT_MEMORY_POOL.
>> >
>> > To avoid interfering with the user's allocations Arrow does not
>> > configure the system allocator at all.  So when Arrow builds it alters
>> > it slightly (using cmake variables I think) to be specific to Arrow.
>> > This might make it a bit tricky to get debug symbols for jemalloc but
>> > you could always build Arrow in debug mode and intercept the methods
>> > in memory_pool.cc if your focus is tracking allocations.
>> >
>> > Arrow still uses the system allocator for all non-buffer allocations.
>> > So, for example, when reading in a large IPC file, the majority of the
>> > data will be allocated by Arrow's memory pool.  However, the schema,
>> > and the wrapper array object itself will be allocated by the system
>> > allocator.  This is probably why switching the system allocator to
>> > jemalloc shows some, but not all, Arrow allocations happening there.
>> >
>> > On Tue, Jun 14, 2022 at 5:28 AM John Muehlhausen <j...@jgm.org> wrote:
>> > >
>> > > A code review has demonstrated that Arrow uses posix_memalign ... I do
>> > > believe mimalloc preload is "catching" this but I didn't tool it with
>> my
>> > > customization.  Still interested in any guidance on the other points
>> > > raised, and sorry for some of this being noise.
>> > >
>> > > -John
>> > >
>> > > On Tue, Jun 14, 2022 at 9:06 AM John Muehlhausen <j...@jgm.org> wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > This comment is regarding installation with `apt` on ubuntu 18.04
>> ...
>> > > > `libarrow-dev/bionic,now 8.0.0-1 amd64`
>> > > >
>> > > > I'm a bit confused about the memory pool situation:
>> > > >
>> > > > * I run with `ARROW_DEFAULT_MEMORY_POOL=system` and check that
>> > > > `arrow::default_memory_pool()->backend_name() ==
>> > > > arrow::system_memory_pool()->backend_name()`
>> > > >
>> > > > * I then LD_PRELOAD a customized (*) mimalloc according to the
>> directions
>> > > > at the mimalloc git repo and things like `strm->Reset(INT32_MAX);`
>> seem not
>> > > > to be hitting it... I figured that is a big enough chunk to jostle
>> it into
>> > > > doing something... `BufferOutputStream::Create(INT32_MAX)` is also
>> not
>> > > > intercepted by mimalloc.  Is the "system" pool somehow going around
>> the
>> > > > typical allocation interfaces on linux?  I built my own .so and
>> linked it
>> > > > to the app and malloc() is getting intercepted.
>> > > >
>> > > > * `arrow::mimalloc_memory_pool(&mmmp);` does return something... but
>> > > > apparently not "my" mimalloc ... statically linked?
>> > > >
>> > > > * what is going on in Arrow with constructor (pre-main())
>> allocations?
>> > > > Some of this does hit my LD_PRELOADed mimalloc
>> > > >
>> > > > * any way to get symbols for the apt-installed libs or would I need
>> to
>> > > > build from source to get backtrace with symbols? (for chasing down
>> sources
>> > > > of allocations)
>> > > >
>> > > > * what is the C++ lib equivalent of the following from the Python
>> code?  I
>> > > > figure I could stop trying to understand the built-in/default
>> allocators if
>> > > > I could just replace them... but this may also intersect with my
>> question
>> > > > about constructors.  Maybe I'd have to make sure my constructor
>> runs first
>> > > > to perform the switch-a-roo before anything else tries to use the
>> default
>> > > > pool?
>> > > >
>> > > > ```
>> > > > namespace py {
>> > > >
>> > > > static std::mutex memory_pool_mutex;
>> > > > static MemoryPool* default_python_pool = nullptr;
>> > > >
>> > > > void set_default_memory_pool(MemoryPool* pool) {
>> > > >   std::lock_guard<std::mutex> guard(memory_pool_mutex);
>> > > >   default_python_pool = pool;
>> > > > }
>> > > > ```
>> > > >
>> > > >
>> > > > (*) the mimalloc customization: the main app has a weak reference
>> that
>> > > > ends up defined by the LD_PRELOAD mimalloc, where the function
>> so-supplied
>> > > > allows the app to install a function pointer (back to the main app)
>> that
>> > > > gets called (if defined) at various interesting points in mimalloc
>> > > >
>> > > >
>> > > > Thanks,
>> > > > John
>> > > >
>>
>

Reply via email to