Brian states
This will
allow users to turn ptmalloc2 support on/off at application link time
instead of MPI compile time.
Where I assume "MPI compile time" means when the MPI *implementation* is
compiled.
So what about LD_PRELOAD? Can the user defer the decision to use
ptmalloc until application launch?
If so, this begs the question of an mpirun option to "enable
leave_pinned, placing libompi-malloc.so in LD_PRELOAD if required".
Can/will/should such an option exist?
-Paul
Brian W. Barrett wrote:
Hi all -
Sorry this is so late, but it took a couple of iterations with a couple of
people to get right from a technology standpoint. All mistakes in this
proposal are my fault.
What: Fix the ptmalloc2 problem
How: Remove it from the default path
When: This weekend? For the 1.3 branch
The problem: On Linux today, we by default build a copy of ptmalloc2 into
libopen-pal.so so that RDMA networks can get better bandwidth using
leave_pinned. Normally users don't use or need leave_pinned, but we need
to have it available for benchmarks and the few apps that gain by having
the more independent-ish progress. However, by having it there, we're
screwing with the memory manager, which has a number of bad side effects.
First, it can cause numerous crashes if the user is providing his/her own
allocator. Second, there is growing evidence that the ptmalloc2 in Open
MPI has an evil corner case we can't pinn down that causes explosive
growth in memory utilization.
The proposal: Remove ptmalloc2 from libopen-pal.so and make it a
standalone library (for forward compatibility, currently called
libompi-malloc.so), which the user can explicitly link in. This will
allow users to turn ptmalloc2 support on/off at application link time
instead of MPI compile time. Given the limited number of leave_pinned
users, this seems to be a good compromise for the near-term between
greater stability for most users and fast performance for power users.
The mallopt() solution, which means free() never gives memory back to the
OS (but does reuse it), which works well for benchmarks, will still be
available at all times.
The work: Some autoconf magic to move most (but not all -- in particular
the munmap() support) of the ptmalloc2 component into its own library.
This is extremely low risk, and actually lowers the risk of Open MPI
breaking by removing code from the critical path. There will also be a
small number of enhancements to the mpool base code to better detect
situations where leave_pinned is used by we can't sense giving memory back
to the OS.
I'd like this for 1.3, as we're running into more and more situations
where this code isn't working. Also, the lone supporter of the ptmallco2
code (me) doesn't want to do it anymore and removing the code from the
critical path will lower the workload of me (ie, the retired guy who's
doing this for fun).
If you have objections, please let me know before Friday. I'd like to
commit these changes to the trunk this weekend.
Brian
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Paul H. Hargrove phhargr...@lbl.gov
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900