Re: [O-MPI devel] [PATH] ompi_info doesn't show use_mem_hooks flag

Brian Barrett Wed, 7 Dec 2005 10:40:56 -0500


On Dec 7, 2005, at 9:44 AM, Gleb Natapov wrote:

On Tue, Dec 06, 2005 at 11:07:44AM -0500, Brian Barrett wrote:

On Dec 6, 2005, at 10:53 AM, Gleb Natapov wrote:

On Tue, Dec 06, 2005 at 08:33:32AM -0700, Tim S. Woodall wrote:

Also memfree hooks decrease cache efficiency, the better solution
would
be to catch brk() system calls and remove memory from cache only
then,
but there is no way to do it for now.


We are look at other options, including catching brk/munmap system
calls, and
will be experimenting w/ these on the trunk.

This will be really interesting. How are you going to catch brk/munmapwithout kernel help? Last time I checked preload tricks don'twork if

syscall is done from inside libc itself.


All of the tricks we are looking at assume that nothing in libc calls
munmap.

glibc does call mmap/munmap internally for big allocations asstrace of

this program shows:

int main ()
{
        void *p = malloc (1024*1024);
        free (p);
}

Ah, yes, I wasn't clear. On Linux, we actually ship our own versionof ptmalloc2 (the allocator used by glibc on Linux). We use thestandard linker search order tricks to have the linker choose ourversions of malloc, calloc, realloc, valloc, and free, which are fromptmalloc2. We've modified our version of ptmalloc2 such that anytime it calls mmap or sbrk with a positive number, it thenimmediately allows the cache to know about the allocation. Any timeit's about to call munmap or sbrk with a negative number, it informsthe cache code before giving the memory back to the OS. We alsocatch mmap and munmap so that we can track when the user calls mmap /munmap. Note that we play with ptmalloc2's code such that it callsour mmap (which either uses the syscall interface directly or calls__mmap depending on what the system supports), so we don't interceptthat call to mmap twice or anything like that.

This works pretty well (like I said - it's worked fine for LAM andMPICH-gm for years), but has the problem of requiring the user toeither use the wrapper compilers or add the -lmpi -lorte -lopal tothe link line (ie, can't use shared library dependencies to load inlibopal.so) or our ptmalloc2 / mmap / munmap isn't used. We candetect that this happened pretty easily and then we fall back to thepipelined RDMA code that doesn't offer the same performance but alsodoesn't have a pinning problem.

         We can successfully catch free() calls from inside libc
without any problems.  The LAM/MPI team and Myricom (with MPICH-gm)
have been doing this for many years without any problems.  On the
small percentage of MPI applications that require some linker tricks
(some of the commercial apps are this way), we won't be able to
intercept any free/munmap calls, so we're going to fall back to our
RDMA pipeline algorithm.

Yes, but catching free is not good enough. This way we sometimes evict

cache entries that may safely remains in the cache. Ideally weshould beable to catch events that return memory to OS (munmap/brk) andremove the

memory from cache only then.

This is essentially what we do on Linux - we only tell the rcachecode about allocations / deallocations when we are talking aboutgetting memory from or giving memory back to the operating system.

On Mac OS X / Darwin, due to their two level namespaces, we can'treplace malloc / free with a customized version of the Darwinallocator like we could with ptmalloc2. There are some things youcan do to simulate such behavior, but it requires linking in a flatnamespace and doing some other things that nearly the Darwinengineers to pass out when I was talking to them about said tricks.So instead, we use the Darwin hooks for catching malloc / free /etc. It's not optimal, but it's the best we can do in thesituation. And it doesn't force us to link all OMPI applications ina flat namespace, which is always nice. Of course, we stillintercept mmap / munmap in the tradition linker tricks style. Butagain, there are very few function calls in libSystem.dylib that callmmap that we care about (malloc / free are already taken care of bythe standard hooks), so this doesn't cause a problem.

Hopefully this made some sense. If not, on to the next round of e-mails :).


Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/

Re: [O-MPI devel] [PATH] ompi_info doesn't show use_mem_hooks flag

Reply via email to