brian: 1st point: propose remove opal/mca/memory/darwin (memory hooks
  on OS X).  Rationale:
  - mvapi support is gone
  - gm would be only user
  - no one is supporting the code anymore (it ain't broke, but...)
  --> patrick says: no problem.  only myri osx customers have a special
      mpich-mx, so it's ok.
  --> jeff will svn rm mca/memory/darin

discussion about current state of ptmalloc2
- only really useful for benchmarks (i.e., --mca mpi_leave_pinned 1)
- why have it in the way for apps that don't use mpi_leave_pinned?
- it gets in the way of MX (we "sorta" get away with it)
- also, we can't use ptmalloc2 for sun -- would be nice to do
  something that they can use
- also remember that we hacked our copy of ptmalloc2 to make it work
  nicely (e.g., because OF deregister calls malloc/free)
  - note that our ptmalloc2 hacks are basically equivalent to mallopt:
    we rarely return memory to the OS (e.g., very large allocations,
    when ptmalloc uses its munmap case)
  --> brian will double check this point

4 proposals:

1. patrick proposes to use the MMU notifiers -- likely to be in linux
2.6.27
  - network driver will need to implement reg cache functionality
  - these MMU notifiers will not be visible to OMPI; OMPI simply
    *always* registers (a system call) and the driver implements the
    cache and does the de-register for you when the memory is freed
  - gleb asks: don't we want to avoid the system call when possible?
  - patrick: a single syscall can be/is cheaper than a reg cache
    lookup in user space

2. patrick also proposes dlmalloc
  - not as efficient as ptmalloc2 (no fine-grained thread locks)
  - but is more robust and simpler than ptmalloc2 (mpich-mx switched
    to it long ago)
  - has the same linker issues as ptmalloc2 (e.g., will be problematic
    with apps that require their own allocator)
  --> better for longer term (e.g., OMPI v1.4) because dlmalloc
      handles large numbers of short malloc/free's better than
      ptmalloc*
  --> upgrading to dlmalloc is also subject to points at bottom of
      these notes (don't call free() during de-register code paths)

3. brian proposes mallopt
  - patrick says: you have to check if registering memory is on the
    stack.  what do we do now?
  - neither brian nor galen remembers offhand; we'll need to check
  - we will have problems with apps that do lots of small allocations,
    but still better than ptmalloc2 because can turn off mallopt via
    MCA param (i.e., just tell users: "don't use mpi_leave_pinned")
    instead of recompiling/reinstalling OMPI to disable ptmalloc2

4. patrick also mentions: can simply use pipeline (take the bw perf
  hit).  Unfortunalely, not feasible for benchmarks.  :-(

---------------------------

For v1.3, gravitating towards the following: leave ptmalloc2 as component
  in the v1.3 tarball, but don't build it unless explicitly requested,
  and ensure that the mallopt() protocol stuff works.

  - note that the mallopt code is currently enabled by 2nd mca param
  - patrick: no guarantee that malloc will comply; it's only a hint.
    need to have a run-time test to ensure that it works: set the trim
    threshhold to large.  then malloc something just over the
    threshhold and free it, and see if munmap hooks were called.
  - brian: we'll need to add the hooks for munmap (probably move them
    from where they are currently located)
  - patrick: what about case like CHARMM where they have their own
    allocator and don't support mallopt() hints?
  - brian: same as today -- if you provide your own allocator,
    leave_pinned doesn't work.  benefit here is that if you're *not*
    using leave_pinned, then don't have heavyweight ptmalloc2 in the
    way.  but you are hosed if you try to have your own allocator with
    leave_pinned.

*** brian's proposal for v1.3:
  - disable building ptmalloc2 unless specifically requested
  - add a component for intercepting munmap
  - enable mallopt by default (currently in the mpool base) if all of
    the following is true:
      - you are using the munmap-intercept component (we can check
        this at run-time)
      - leave_pinned was requested
      - mallopt hints work

--------------------------

- gleb: random note: if you call free from a callback in a threaded
  build, we can deadlock
  - brian: because OpenFabrics unregister calls malloc/free, and this
    causes problems.  we added a hack-ish loop to try to handle this.
    probably not completely corect; don't really know what *to* do.
  - gleb: solved in openib btl -- we simply don't unreg on callback
    (just save it on a list to unregister later).  but there are other
    places it can/does happen.
  - brian: yes, it's likely to be a big problem to cleanup.  unlikely
    to happen for v1.3.

--
Jeff Squyres
Cisco Systems

Reply via email to