pitrou opened a new issue, #50083: URL: https://github.com/apache/arrow/issues/50083
### Describe the enhancement requested The [memray memory profiler](https://github.com/bloomberg/memray) works by interposing certain dynamic symbols in the profiled process to replace them with their own functions that will collect memory allocation data. It will currently, to the best of my knowledge, only recognize system C calls such `malloc`, `mmap`... When a third-party allocator like mimalloc or jemalloc is being used, such that Arrow does by default, memray does not see the logical allocation calls made through these allocator's APIs (because they are not interposed), but only the raw memory reservations that they issue using system routines. This can lead people using memray to think that a given Arrow workload (or any workload using such allocators, really) that an inordinate amount of memory is being used, while the reported memory mostly represents non-committed virtual memory that the allocator keeps for performance reasons. Concrete example in GH-40301: we allocate a number of 1kiB buffers from mimalloc, but memray sees a similar number of 64MiB calls to `mmap`. We [discussed](https://github.com/bloomberg/memray/issues/577) how to enhance memray such as to account for the corresponding logical allocations, and we came to the conclusion that it requires that Arrow exposes API calls that can be dynamically interposed. Since we typically build against a static `libmimalloc.a`, the mimalloc symbols cannot be exposed (at least, I cannot seem to get this to work on Ubuntu). This means we need to define our own symbols wrapping the mimalloc APIs. ### Component(s) C++ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
