What: We never call del_procs in the procs in comm world. This leads us to leak the bml endpoints created by r2.
The proposed solution is not idea but it avoids adding a call to del procs for comm world. Something I know would require more discussion since there is likely a reason for that. I propose we delete any remaining bml endpoints when we tear down the ompi_proc_t: diff --git a/ompi/proc/proc.c b/ompi/proc/proc.c index f549335..9ea0311 100644 --- a/ompi/proc/proc.c +++ b/ompi/proc/proc.c @@ -89,6 +89,13 @@ void ompi_proc_destruct(ompi_proc_t* proc) OPAL_THREAD_LOCK(&ompi_proc_lock); opal_list_remove_item(&ompi_proc_list, (opal_list_item_t*)proc); OPAL_THREAD_UNLOCK(&ompi_proc_lock); + +#if defined(OMPI_PROC_ENDPOINT_TAG_BML) + /* release the bml endpoint if it still exists */ + if (proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_BML]) { + OBJ_RELEASE(proc->proc_endpoints[OMPI_PROC_ENDPOINT_TAG_BML]); + } +#endif } This fixes the leak and appears to have no negative side effects for r2. Why: Trying to clean up the last remaining leaks in the Open MPI code base. This is one of the larger ones as it grows with comm world. When: I want this to go into 1.8.2 if possible. Setting a short timeout of 1 week. Keep in mind I do not know the full history of add_procs/del_procs so there may be a better way to fix this. This RFC is meant to open the discussion about how to address this leak. If the above fix looks ok I will commit it. -Nathan
pgpxSOjRQoGpG.pgp
Description: PGP signature