Sorry, that should have said "when Arrow builds jemalloc".  Here is
the command we send down (from ThirdPartyToolchain.cmake):

```
JEMALLOC_CONFIGURE_COMMAND
"--prefix=${JEMALLOC_PREFIX}"
"--libdir=${JEMALLOC_LIB_DIR}"
"--with-jemalloc-prefix=je_arrow_"
"--with-private-namespace=je_arrow_private_"
"--without-export"
"--disable-shared"
# Don't override operator new()
"--disable-cxx"
"--disable-libdl"
# See https://github.com/jemalloc/jemalloc/issues/1237
"--disable-initial-exec-tls"
${EP_LOG_OPTIONS})
list(APPEND
```

On Tue, Jun 14, 2022 at 5:35 AM Weston Pace <weston.p...@gmail.com> wrote:
>
> I can try and give a more detailed answer later in the week but the
> gist of it is that Arrow manages all "buffer allocations" with a
> memory pool.  These are the allocations for the actual data in the
> arrays.  These are the allocations that use the memory pool configured
> by ARROW_DEFAULT_MEMORY_POOL.
>
> To avoid interfering with the user's allocations Arrow does not
> configure the system allocator at all.  So when Arrow builds it alters
> it slightly (using cmake variables I think) to be specific to Arrow.
> This might make it a bit tricky to get debug symbols for jemalloc but
> you could always build Arrow in debug mode and intercept the methods
> in memory_pool.cc if your focus is tracking allocations.
>
> Arrow still uses the system allocator for all non-buffer allocations.
> So, for example, when reading in a large IPC file, the majority of the
> data will be allocated by Arrow's memory pool.  However, the schema,
> and the wrapper array object itself will be allocated by the system
> allocator.  This is probably why switching the system allocator to
> jemalloc shows some, but not all, Arrow allocations happening there.
>
> On Tue, Jun 14, 2022 at 5:28 AM John Muehlhausen <j...@jgm.org> wrote:
> >
> > A code review has demonstrated that Arrow uses posix_memalign ... I do
> > believe mimalloc preload is "catching" this but I didn't tool it with my
> > customization.  Still interested in any guidance on the other points
> > raised, and sorry for some of this being noise.
> >
> > -John
> >
> > On Tue, Jun 14, 2022 at 9:06 AM John Muehlhausen <j...@jgm.org> wrote:
> >
> > > Hello,
> > >
> > > This comment is regarding installation with `apt` on ubuntu 18.04 ...
> > > `libarrow-dev/bionic,now 8.0.0-1 amd64`
> > >
> > > I'm a bit confused about the memory pool situation:
> > >
> > > * I run with `ARROW_DEFAULT_MEMORY_POOL=system` and check that
> > > `arrow::default_memory_pool()->backend_name() ==
> > > arrow::system_memory_pool()->backend_name()`
> > >
> > > * I then LD_PRELOAD a customized (*) mimalloc according to the directions
> > > at the mimalloc git repo and things like `strm->Reset(INT32_MAX);` seem 
> > > not
> > > to be hitting it... I figured that is a big enough chunk to jostle it into
> > > doing something... `BufferOutputStream::Create(INT32_MAX)` is also not
> > > intercepted by mimalloc.  Is the "system" pool somehow going around the
> > > typical allocation interfaces on linux?  I built my own .so and linked it
> > > to the app and malloc() is getting intercepted.
> > >
> > > * `arrow::mimalloc_memory_pool(&mmmp);` does return something... but
> > > apparently not "my" mimalloc ... statically linked?
> > >
> > > * what is going on in Arrow with constructor (pre-main()) allocations?
> > > Some of this does hit my LD_PRELOADed mimalloc
> > >
> > > * any way to get symbols for the apt-installed libs or would I need to
> > > build from source to get backtrace with symbols? (for chasing down sources
> > > of allocations)
> > >
> > > * what is the C++ lib equivalent of the following from the Python code?  I
> > > figure I could stop trying to understand the built-in/default allocators 
> > > if
> > > I could just replace them... but this may also intersect with my question
> > > about constructors.  Maybe I'd have to make sure my constructor runs first
> > > to perform the switch-a-roo before anything else tries to use the default
> > > pool?
> > >
> > > ```
> > > namespace py {
> > >
> > > static std::mutex memory_pool_mutex;
> > > static MemoryPool* default_python_pool = nullptr;
> > >
> > > void set_default_memory_pool(MemoryPool* pool) {
> > >   std::lock_guard<std::mutex> guard(memory_pool_mutex);
> > >   default_python_pool = pool;
> > > }
> > > ```
> > >
> > >
> > > (*) the mimalloc customization: the main app has a weak reference that
> > > ends up defined by the LD_PRELOAD mimalloc, where the function so-supplied
> > > allows the app to install a function pointer (back to the main app) that
> > > gets called (if defined) at various interesting points in mimalloc
> > >
> > >
> > > Thanks,
> > > John
> > >

Reply via email to