brian: 1st point: propose remove opal/mca/memory/darwin (memory hooks
on OS X). Rationale:
- mvapi support is gone
- gm would be only user
- no one is supporting the code anymore (it ain't broke, but...)
--> patrick says: no problem. only myri osx customers have a special
mpich-mx, so it's ok.
--> jeff will svn rm mca/memory/darin
discussion about current state of ptmalloc2
- only really useful for benchmarks (i.e., --mca mpi_leave_pinned 1)
- why have it in the way for apps that don't use mpi_leave_pinned?
- it gets in the way of MX (we "sorta" get away with it)
- also, we can't use ptmalloc2 for sun -- would be nice to do
something that they can use
- also remember that we hacked our copy of ptmalloc2 to make it work
nicely (e.g., because OF deregister calls malloc/free)
- note that our ptmalloc2 hacks are basically equivalent to mallopt:
we rarely return memory to the OS (e.g., very large allocations,
when ptmalloc uses its munmap case)
--> brian will double check this point
4 proposals:
1. patrick proposes to use the MMU notifiers -- likely to be in linux
2.6.27
- network driver will need to implement reg cache functionality
- these MMU notifiers will not be visible to OMPI; OMPI simply
*always* registers (a system call) and the driver implements the
cache and does the de-register for you when the memory is freed
- gleb asks: don't we want to avoid the system call when possible?
- patrick: a single syscall can be/is cheaper than a reg cache
lookup in user space
2. patrick also proposes dlmalloc
- not as efficient as ptmalloc2 (no fine-grained thread locks)
- but is more robust and simpler than ptmalloc2 (mpich-mx switched
to it long ago)
- has the same linker issues as ptmalloc2 (e.g., will be problematic
with apps that require their own allocator)
--> better for longer term (e.g., OMPI v1.4) because dlmalloc
handles large numbers of short malloc/free's better than
ptmalloc*
--> upgrading to dlmalloc is also subject to points at bottom of
these notes (don't call free() during de-register code paths)
3. brian proposes mallopt
- patrick says: you have to check if registering memory is on the
stack. what do we do now?
- neither brian nor galen remembers offhand; we'll need to check
- we will have problems with apps that do lots of small allocations,
but still better than ptmalloc2 because can turn off mallopt via
MCA param (i.e., just tell users: "don't use mpi_leave_pinned")
instead of recompiling/reinstalling OMPI to disable ptmalloc2
4. patrick also mentions: can simply use pipeline (take the bw perf
hit). Unfortunalely, not feasible for benchmarks. :-(
---------------------------
For v1.3, gravitating towards the following: leave ptmalloc2 as
component
in the v1.3 tarball, but don't build it unless explicitly requested,
and ensure that the mallopt() protocol stuff works.
- note that the mallopt code is currently enabled by 2nd mca param
- patrick: no guarantee that malloc will comply; it's only a hint.
need to have a run-time test to ensure that it works: set the trim
threshhold to large. then malloc something just over the
threshhold and free it, and see if munmap hooks were called.
- brian: we'll need to add the hooks for munmap (probably move them
from where they are currently located)
- patrick: what about case like CHARMM where they have their own
allocator and don't support mallopt() hints?
- brian: same as today -- if you provide your own allocator,
leave_pinned doesn't work. benefit here is that if you're *not*
using leave_pinned, then don't have heavyweight ptmalloc2 in the
way. but you are hosed if you try to have your own allocator with
leave_pinned.
*** brian's proposal for v1.3:
- disable building ptmalloc2 unless specifically requested
- add a component for intercepting munmap
- enable mallopt by default (currently in the mpool base) if all of
the following is true:
- you are using the munmap-intercept component (we can check
this at run-time)
- leave_pinned was requested
- mallopt hints work
--------------------------
- gleb: random note: if you call free from a callback in a threaded
build, we can deadlock
- brian: because OpenFabrics unregister calls malloc/free, and this
causes problems. we added a hack-ish loop to try to handle this.
probably not completely corect; don't really know what *to* do.
- gleb: solved in openib btl -- we simply don't unreg on callback
(just save it on a list to unregister later). but there are other
places it can/does happen.
- brian: yes, it's likely to be a big problem to cleanup. unlikely
to happen for v1.3.
--
Jeff Squyres
Cisco Systems