Brian states
This will allow users to turn ptmalloc2 support on/off at application link time instead of MPI compile time.
Where I assume "MPI compile time" means when the MPI *implementation* is compiled.

So what about LD_PRELOAD? Can the user defer the decision to use ptmalloc until application launch? If so, this begs the question of an mpirun option to "enable leave_pinned, placing libompi-malloc.so in LD_PRELOAD if required". Can/will/should such an option exist?

-Paul


Brian W. Barrett wrote:
Hi all -

Sorry this is so late, but it took a couple of iterations with a couple of
people to get right from a technology standpoint.  All mistakes in this
proposal are my fault.

What: Fix the ptmalloc2 problem
How: Remove it from the default path
When: This weekend?  For the 1.3 branch

The problem: On Linux today, we by default build a copy of ptmalloc2 into libopen-pal.so so that RDMA networks can get better bandwidth using leave_pinned. Normally users don't use or need leave_pinned, but we need to have it available for benchmarks and the few apps that gain by having the more independent-ish progress. However, by having it there, we're screwing with the memory manager, which has a number of bad side effects. First, it can cause numerous crashes if the user is providing his/her own allocator. Second, there is growing evidence that the ptmalloc2 in Open MPI has an evil corner case we can't pinn down that causes explosive growth in memory utilization.

The proposal: Remove ptmalloc2 from libopen-pal.so and make it a standalone library (for forward compatibility, currently called libompi-malloc.so), which the user can explicitly link in. This will allow users to turn ptmalloc2 support on/off at application link time instead of MPI compile time. Given the limited number of leave_pinned users, this seems to be a good compromise for the near-term between greater stability for most users and fast performance for power users. The mallopt() solution, which means free() never gives memory back to the OS (but does reuse it), which works well for benchmarks, will still be available at all times.

The work: Some autoconf magic to move most (but not all -- in particular the munmap() support) of the ptmalloc2 component into its own library. This is extremely low risk, and actually lowers the risk of Open MPI breaking by removing code from the critical path. There will also be a small number of enhancements to the mpool base code to better detect situations where leave_pinned is used by we can't sense giving memory back to the OS.

I'd like this for 1.3, as we're running into more and more situations where this code isn't working. Also, the lone supporter of the ptmallco2 code (me) doesn't want to do it anymore and removing the code from the critical path will lower the workload of me (ie, the retired guy who's doing this for fun).

If you have objections, please let me know before Friday. I'd like to commit these changes to the trunk this weekend.

Brian
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to